RPI3 swap experiments ["was killed: out of swap space" with: "v_free_count: 5439, v_inactive_count: 1"]

Mark Johnston markj at freebsd.org
Mon Aug 6 15:58:45 UTC 2018

On Wed, Aug 01, 2018 at 09:27:31PM -0700, Mark Millard wrote:
> [I have a top-posted introduction here in reply
> to a message listed at the bottom.]
> Bob P. meet Mark J. Mark J. meet Bob P. I'm
> hopinh you can help Bob P. use a patch that
> you once published on the lists. This was from:
> https://lists.freebsd.org/pipermail/freebsd-current/2018-June/069835.html
> Bob P. has been having problems with an rpi3
> based buildworld ending up with "was killed:
> out of swap space" but when the swap partitions
> do not seem to be heavily used (seen via swapinfo
> or watching top).
> > The patch to report OOMA information did its job, very tersely. The console reported
> > v_free_count: 5439, v_inactive_count: 1
> > Aug  1 18:08:25 www kernel: pid 93301 (c++), uid 0, was killed: out of swap space
> > 
> > The entire buildworld.log and gstat output are at
> > http://www.zefox.net/~fbsd/rpi3/swaptests/r336877M/
> > 
> > It appears that at 18:08:21 a write to the USB swap device took 530.5 ms, 
> > next top was killed and ten seconds later c++ was killed, _after_ da0b
> > was no longer busy.

My suspicion, based on the high latency, is that this is a consequence
of r329882, which lowered the period of time that the page daemon will
sleep while waiting for dirty pages to be cleaned.  If a certain number
of consecutive wakeups and queue scans occur without making progress,
the OOM killer is triggered.  That number is vm.pageout_oom_seq - could
you try increasing it by a factor of 10 and retry your test?

> > This buildworld stopped a quite a bit earlier than usual; most of the time
> > the buildworld.log file is close to 20 MB at the time OOMA acts. In this case
> > it was around 13 MB. Not clear if that's of significance.
> > 
> > If somebody would indicate whether this result is informative, and any possible
> > improvements to the test, I'd be most grateful. 

If the above suggestion doesn't help, the next thing to try would be to
revert the oom_seq value to the default, apply this patch, and see if
the problem continues to occur.  If this doesn't help, please try
applying both measures, i.e., set oom_seq to 120 _and_ apply the patch.

diff --git a/sys/vm/vm_pagequeue.h b/sys/vm/vm_pagequeue.h
index fb56bdf2fdfc..29a16060253f 100644
--- a/sys/vm/vm_pagequeue.h
+++ b/sys/vm/vm_pagequeue.h
@@ -74,7 +74,7 @@ struct vm_pagequeue {
 } __aligned(CACHE_LINE_SIZE);
 struct vm_batchqueue {

