RPI3 swap experiments ["was killed: out of swap space" with: "v_free_count: 5439, v_inactive_count: 1"]

Mon Aug 6 01:43:51 UTC 2018

On 2018-Aug-5, at 4:15 PM, bob prohaska <fbsd at www.zefox.net> wrote:

> On Sun, Aug 05, 2018 at 10:39:27AM -0700, Mark Millard wrote:
>> 
>> "Total Active Working Set too large" (with lots of swap left) seems to be what folks
>> are running into in these rpi3/rpi2 examples.
>> 
>> 
> Does the size of the working set vary with how the swap is apportioned between
> storage devices? In my observations, 1 GB plus 2 GB swap partitions both on microSD 
> allows a -j4 buildworld to run to completion without OOMA intervention. It looks as
> if both partitions are burdened equally, so usable swap is actually 2 GB.

The less memory in use for other things (especially less wired memory),
the more for the working set.

The overallocation of swap beyond the maximum recommended might
have some contribution to how much the working set can be. But
I've no clue of any specific tradeoffs.

Describing 2 or 3 in-use swap partitions instead of just one might.
But, again, I've no clue of any specific tradeoffs.

> A pair of 1 GB swap partitions, one on microSD and one on USB flash, invites OOMA 
> to intervene prematurely.

As I remember you have examples that involve only one device and,
separately, you have some problematic devices. Others seem to have
avoided some of the complications: They got the problem in simpler
contexts.

> Stranger still, 1 GB of swap on microSD, which is insufficient (~1.4 GB required), 
> does not trigger OOMA at all, resulting in a hung system.

Part of the problem is that messages like:
(taken from a John Kennedy message)

	Aug  5 01:34:24 rpi3 kernel: pid 63223 (ld.lld), uid 0, was killed: out of swap space
	Aug  5 01:34:26 rpi3 kernel: pid 63360 (c++), uid 0, was killed: out of swap space
	Aug  5 01:34:26 rpi3 kernel: pid 846 (ntpd), uid 123, was killed: out of swap space

are misleading: the issue is not necessarily "out of swap space" but
what the book described (when there is lots of swap around):

QUOTE:
The FreeBSD swap-out daemon will not select a runnable processes to swap
out. So, if the set of runnable processes do not fit in memory, the
machine will effectively deadlock. Current machines have enough memory
that this condition usually does not arise. If it does, FreeBSD avoids
deadlock by killing the largest process. If the condition begins to arise
in normal operation, the 4.4BSD algorithm will need to be restored.
END QUOTE.

(The 4.4BSD algorithm is not in use [or even present].)

The truly "out of swap space" context is explicitly different code
doing different things (to some extent). Trev has reported that a
message that can (sometimes?) be produced for this is like:

Aug  5 17:54:01 sentinel kernel: swap_pager_getswapspace(32): failed

(at least if it does not hang up first). "was killed: out of swap space"
without prior "swap_pager_getswapspace(??): failed" notices look to not
be trusted for the "out of" wording or its implications.

> The RPi3 right now has three swap partitions; one of 1 GB on microSD, one
> of 2 GB on microSD and one of 1 GB on USB. Might it shed any light to try 
> sysctl vm.swap_idle_enabled=1
> just to see if anything changes? My guess is no, but it wouldn't be the first 
> time if I'm wrong.

While I have expectation based on my understanding of what I've
read (including some code) and seen in top, I think you should do
such experiments if you are interested in taking the time. (The
same goes for others with simpler contexts.)

Doing so even for the "single device" type of context that showed
the problem likely would be good to do as well.

You might even want to see how it goes for a context that
worked. ("Does it then still work?" is also good to know.)

I continue to encourage you to use total swap space figures that
are somewhat under what that message happens to be listing as
the maximum at the time (if you are getting such messages).
(You might well be doing so. I've not been monitoring for this.)
Being "over the maximum recommended" just seems to be a
complication of the context that is fairly easily avoided.
There is already a lot involved.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)