RPI3 swap experiments ["was killed: out of swap space" with: "v_free_count: 5439, v_inactive_count: 1"]

Thu Aug 9 16:21:47 UTC 2018

On 2018-Aug-9, at 8:21 AM, Mark Johnston <markj at freebsd.org> wrote:

> On Wed, Aug 08, 2018 at 11:56:48PM -0700, bob prohaska wrote:
>> On Wed, Aug 08, 2018 at 04:48:41PM -0400, Mark Johnston wrote:
>>> On Wed, Aug 08, 2018 at 08:38:00AM -0700, bob prohaska wrote:
>>>> The patched kernel ran longer than default but OOMA still halted buildworld around
>>>> 13 MB. That's considerably farther than a default build world have run but less than
>>>> observed when setting vm.pageout_oom_seq=120 alone. Log files are at
>>>> http://www.zefox.net/~fbsd/rpi3/swaptests/r337226M/1gbsdflash_1gbusbflash/batchqueue/
>>>> 
>>>> Both changes are now in place and -j4 buildworld has been restarted. 
>>> 
>>> Looking through the gstat output, I'm seeing some pretty abysmal average
>>> write latencies for da0, the flash drive.  I also realized that my
>>> reference to r329882 lowering the pagedaemon sleep period was wrong -
>>> things have been this way for much longer than that.  Moreover, as you
>>> pointed out, bumping oom_seq to a much larger value wasn't quite
>>> sufficient.
>>> 
>>> I'm curious as to what the worst case swap I/O latencies are in your
>>> test, since the average latencies reported in your logs are high enough
>>> to trigger OOM kills even with the increased oom_seq value.  When the
>>> current test finishes, could you try repeating it with this patch
>>> applied on top? https://people.freebsd.org/~markj/patches/slow_swap.diff
>>> That is, keep the non-default oom_seq setting and modification to
>>> VM_BATCHQUEUE_SIZE, and apply this patch on top.  It'll cause the kernel
>>> to print messages to the console under certain conditions, so a log of
>>> console output will be interesting.
>> 
>> The run finished with a panic, I've collected the logs and terminal output at
>> http://www.zefox.net/~fbsd/rpi3/swaptests/r337226M/1gbsdflash_1gbusbflash/batchqueue/pageout120/slow_swap/
>> 
>> There seems to be a considerable discrepancy between the wait times reported
>> by the patch and the wait times reported by gstat in the first couple of 
>> occurrences. The fun begins at timestamp Wed Aug  8 21:26:03 PDT 2018 in
>> swapscript.log. 
> 
> The reports of "waited for swap buffer" are especially bad: during those
> periods, the laundry thread is blocked waiting for in-flight swap writes
> to finish before sending any more.  Because the system is generally
> quite starved for clean pages that it can reuse, it's relying on swap
> I/O to clean more.  If that fails, the system eventually has no choice
> but to start killing processes (where the time period corresponding to
> "eventually" is determined by vm.pageout_oom_seq).
> 
> Based on these latencies, I think the system is behaving more or less as
> expected from the VM's perspective.  I do think the default oom_seq value
> is too low and will get that addressed in 12.0.

Would something like the patch that produced the messages
like:

waited 3s for async swap write
waited 3s for swap buffer

be appropriate as able to be enabled via a sysctl or in
some other way? In other words: in the source code by standard,
off by default, but able to be enabled without patching,
possibly without rebuilding?

I ask because I've been thinking of having such on the
FreeBSD's where I buildworld buildkernel and use poudriere-devel
for port builds. It might warning me of marginal contexts and
help explain any OOM kills that might occur. (Somethings
things are difficult or time consuming to reproduce.)

If monitored at the time, it might even help identify contexts
that "machine-gun down requests" in environments were such can
be a problem for swapping.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)