RPI3 swap experiments ["was killed: out of swap space" with: "v_free_count: 5439, v_inactive_count: 1"]

Sun Aug 12 23:23:52 UTC 2018

On 2018-Aug-12, at 3:40 PM, bob prohaska <fbsd at www.zefox.net> wrote:

> On Sun, Aug 12, 2018 at 10:32:48AM -0700, John Kennedy wrote:
>> . . .
> Setting vm.pageout_oom_seq to 120 made a decisive improvement, almost allowing
> buildworld to finish. By the time I tried CAM_IOSCHED_DYNAMIC buildworld was
> getting only about half as far, so it seems the patches were harmful to a degree.
> Changes were applied in the order 

You could experiment with figures bigger than 120 for
vm.pageout_oom_seq .

I'll note that the creation of this mechanism seems
to be shown for -r290920 at:

https://lists.freebsd.org/pipermail/svn-src-head/2015-November/078968.html

In part is says:

  . . . only raise OOM when pagedaemon is unable to produce a free
  page in several back-to-back passes.  Track the failed passes per
  pagedaemon thread.

  The number of passes to trigger OOM was selected empirically and
  tested both on small (32M-64M i386 VM) and large (32G amd64)
  configurations.  If the specifics of the load require tuning, sysctl
  vm.pageout_oom_seq sets the number of back-to-back passes which must
  fail before OOM is raised.  Each pass takes 1/2 of seconds.  Less the
  value, more sensible the pagedaemon is to the page shortage.

The code shows:

int vmd_oom_seq

and it looks like fairly large values would be
tolerated. You may be able to scale beyond
the problem showing up in your context.

> pageout 
> batchqueue
> slow_swap
> iosched

For my new Pine64+ 2GB experiments I've only applied
the Mark J. reporting patches, not the #define one.
Nor have I involved CAM_IOSCHED_DYNAMIC.

But with 2 GiBytes of RAM and the default 12 for
vm.pageout_oom_seq I got:

v_free_count: 7773, v_inactive_count: 1
Aug 12 09:30:13 pine64 kernel: pid 80573 (c++), uid 0, was killed: out of swap space

with no other reports from Mark Johnston's reporting
patches.

It appears that long I/O latencies as seen by the
subsystem are not necessary to ending up with OOM
kills, even if they can contribute when they occur.

(7773 * 4 KiBytes = 31,838,298 Bytes, by the way.)

> My RPI3 is now updating to 337688 with no patches/config changes. I'll start the
> sequence over and would be grateful if anybody could suggest a better sequence.

Side note: more text from -r290920 :

  In future, some heuristic to calculate the value of the tunable might
  be designed based on the system configuration and load.  But before it
  can be done, the i/o system must be fixed to reliably time-out
  pagedaemon writes, even if waiting for the memory to proceed.  Then,
  code can account for the in-flight page-outs and postpone OOM until
  all of them finished, which should reduce the need in tuning.  Right
  now, ignoring the in-flight writes and the counter allows to break
  deadlocks due to write path doing sleepable memory allocations.

I've no clue if this ever progressed after -r290920 .

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)