RPI3 swap experiments ["was killed: out of swap space" with: "v_free_count: 5439, v_inactive_count: 1"]

Mark Millard marklmi at yahoo.com
Thu Aug 9 16:52:25 UTC 2018


[I'm adding what lead up to the first waiting for async swap write
notice.]

On 2018-Aug-9, at 9:44 AM, Mark Millard <marklmi at yahoo.com> wrote:

> On 2018-Aug-9, at 8:21 AM, Mark Johnston <markj at FreeBSD.org> wrote:
> 
>> On Wed, Aug 08, 2018 at 11:56:48PM -0700, bob prohaska wrote:
>>> On Wed, Aug 08, 2018 at 04:48:41PM -0400, Mark Johnston wrote:
>>>> On Wed, Aug 08, 2018 at 08:38:00AM -0700, bob prohaska wrote:
>>>>> The patched kernel ran longer than default but OOMA still halted buildworld around
>>>>> 13 MB. That's considerably farther than a default build world have run but less than
>>>>> observed when setting vm.pageout_oom_seq=120 alone. Log files are at
>>>>> http://www.zefox.net/~fbsd/rpi3/swaptests/r337226M/1gbsdflash_1gbusbflash/batchqueue/
>>>>> 
>>>>> Both changes are now in place and -j4 buildworld has been restarted. 
>>>> 
>>>> Looking through the gstat output, I'm seeing some pretty abysmal average
>>>> write latencies for da0, the flash drive.  I also realized that my
>>>> reference to r329882 lowering the pagedaemon sleep period was wrong -
>>>> things have been this way for much longer than that.  Moreover, as you
>>>> pointed out, bumping oom_seq to a much larger value wasn't quite
>>>> sufficient.
>>>> 
>>>> I'm curious as to what the worst case swap I/O latencies are in your
>>>> test, since the average latencies reported in your logs are high enough
>>>> to trigger OOM kills even with the increased oom_seq value.  When the
>>>> current test finishes, could you try repeating it with this patch
>>>> applied on top? https://people.freebsd.org/~markj/patches/slow_swap.diff
>>>> That is, keep the non-default oom_seq setting and modification to
>>>> VM_BATCHQUEUE_SIZE, and apply this patch on top.  It'll cause the kernel
>>>> to print messages to the console under certain conditions, so a log of
>>>> console output will be interesting.
>>> 
>>> The run finished with a panic, I've collected the logs and terminal output at
>>> http://www.zefox.net/~fbsd/rpi3/swaptests/r337226M/1gbsdflash_1gbusbflash/batchqueue/pageout120/slow_swap/
>>> 
>>> There seems to be a considerable discrepancy between the wait times reported
>>> by the patch and the wait times reported by gstat in the first couple of 
>>> occurrences. The fun begins at timestamp Wed Aug  8 21:26:03 PDT 2018 in
>>> swapscript.log. 
>> 
>> The reports of "waited for swap buffer" are especially bad: during those
>> periods, the laundry thread is blocked waiting for in-flight swap writes
>> to finish before sending any more.  Because the system is generally
>> quite starved for clean pages that it can reuse, it's relying on swap
>> I/O to clean more.  If that fails, the system eventually has no choice
>> but to start killing processes (where the time period corresponding to
>> "eventually" is determined by vm.pageout_oom_seq).
> 
> Just for reference from 
> 
> http://www.zefox.net/~fbsd/rpi3/swaptests/r337226M/1gbsdflash_1gbusbflash/batchqueue/pageout120/slow_swap/swapscript.log
> 
> is some context for two of the swap buffer messages that gives some
> information on the timing of the messages and some other activity:
> 
> Wed Aug  8 22:01:14 PDT 2018
> Device          1K-blocks     Used    Avail Capacity
> /dev/da0b         1048576    23912  1024664     2%
> /dev/mmcsd0s3b    1048576    23892  1024684     2%
> Total             2097152    47804  2049348     2%
> Aug  8 21:36:13 www syslogd: last message repeated 1 times
> Aug  8 21:36:13 www kernel: waited 3s for async swap write
> dT: 10.004s  w: 10.000s
> L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps   ms/d   %busy Name
>    2     20      1      4   25.5     19   1216   41.4      0      0    0.0   29.3  mmcsd0
>    3     29      1     11   10.8     28   1244   23.9      0      0    0.0   18.7  da0
>    0      0      0      0    0.0      0      3   42.9      0      0    0.0    0.4  mmcsd0s2
>    2     20      1      4   25.6     19   1213   41.5      0      0    0.0   29.4  mmcsd0s3
>    0      0      0      0    0.0      0      3   43.0      0      0    0.0    0.4  ufs/rootfs
>    2     20      1      4   25.6     19   1213   41.6      0      0    0.0   29.4  mmcsd0s3b
>    2     20      1     11   10.9     19   1238   20.6      0      0    0.0   18.6  da0b
> Wed Aug  8 22:01:24 PDT 2018
> Device          1K-blocks     Used    Avail Capacity
> /dev/da0b         1048576    37284  1011292     4%
> /dev/mmcsd0s3b    1048576    36920  1011656     4%
> Total             2097152    74204  2022948     4%
> Aug  8 21:36:13 www syslogd: last message repeated 1 times
> Aug  8 21:36:13 www kernel: waited 3s for async swap write
> dT: 10.002s  w: 10.000s
> L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps   ms/d   %busy Name
>    0      8      0      0    4.7      8    693   41.7      0      0    0.0   12.2  mmcsd0
>   13     13      0      0    0.0     13    682  132.7      0      0    0.0   78.3  da0
>    0      8      0      0    4.7      8    693   41.8      0      0    0.0   12.2  mmcsd0s3
>    0      8      0      0    4.8      8    693   41.9      0      0    0.0   12.3  mmcsd0s3b
>    5      7      0      0    0.0      7    676   79.5      0      0    0.0   40.1  da0b
> Wed Aug  8 22:01:34 PDT 2018
> Device          1K-blocks     Used    Avail Capacity
> /dev/da0b         1048576    44680  1003896     4%
> /dev/mmcsd0s3b    1048576    44132  1004444     4%
> Total             2097152    88812  2008340     4%
> Aug  8 21:36:13 www kernel: waited 3s for async swap write
> Aug  8 22:01:30 www syslogd: last message repeated 1 times
> Aug  8 22:01:41 www kernel: waited 10s for async swap write
> Aug  8 22:01:41 www kernel: waited 6s for swap buffer
> Aug  8 22:01:41 www kernel: waited 13s for async swap write
> Aug  8 22:01:41 www kernel: waited 3s for swap buffer
> Aug  8 22:01:41 www kernel: waited 12s for async swap write
> Aug  8 22:01:41 www kernel: waited 10s for async swap write
> Aug  8 22:01:41 www kernel: waited 3s for async swap write
> dT: 10.001s  w: 10.000s
> L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps   ms/d   %busy Name
>    0      2      1     26    3.4      1     17    3.8      0      0    0.0    0.9  mmcsd0
>    0      4      3     61    3.8      2     70    5.7      0      0    0.0    1.7  da0
>    0      2      1     26    3.5      1     17    3.9      0      0    0.0    0.9  mmcsd0s3
>    0      1      0      0    0.0      1     17    4.0      0      0    0.0    0.4  mmcsd0s3a
>    0      1      1     26    3.5      0      0    0.0      0      0    0.0    0.5  mmcsd0s3b
>    0      2      2     54    3.8      0      0    0.0      0      0    0.0    0.9  da0b
>    0      2      0      7    4.3      1     70    5.8      0      0    0.0    0.8  da0d
> Wed Aug  8 22:01:51 PDT 2018
> Device          1K-blocks     Used    Avail Capacity
> /dev/da0b         1048576    46156  1002420     4%
> /dev/mmcsd0s3b    1048576    45480  1003096     4%
> Total             2097152    91636  2005516     4%
> Aug  8 22:01:41 www kernel: waited 10s for async swap write
> Aug  8 22:01:41 www kernel: waited 3s for async swap write
> dT: 10.002s  w: 10.000s
> L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps   ms/d   %busy Name
>    0      8      7    251    3.4      0      8    2.6      0      0    0.0    2.5  mmcsd0
>    0      9      7    186    4.2      2     77    7.4      0      0    0.0    3.6  da0
>    0      8      7    251    3.5      0      8    2.6      0      0    0.0    2.6  mmcsd0s3
>    0      0      0      0    0.0      0      8    2.7      0      0    0.0    0.1  mmcsd0s3a
>    0      7      7    251    3.5      0      0    0.0      0      0    0.0    2.5  mmcsd0s3b
>    0      7      7    184    4.2      0      0    0.0      0      0    0.0    2.7  da0b
>    0      2      0      2    4.6      2     76    7.5      0      0    0.0    0.9  da0d

Here is what leads to the first waiting for async swap write notice:

dT: 10.002s  w: 10.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps   ms/d   %busy Name
    0      0      0      0    0.0      0      6    6.8      0      0    0.0    0.1  mmcsd0
    0      1      0      1    2.2      1      9    4.6      0      0    0.0    0.2  da0
    0      0      0      0    0.0      0      6    6.9      0      0    0.0    0.1  mmcsd0s3
    0      0      0      0    0.0      0      6    6.9      0      0    0.0    0.1  mmcsd0s3a
    0      1      0      1    2.3      1      9    5.2      0      0    0.0    0.2  da0d
Wed Aug  8 21:25:32 PDT 2018
Device          1K-blocks     Used    Avail Capacity
/dev/da0b         1048576     6288  1042288     1%
/dev/mmcsd0s3b    1048576     6152  1042424     1%
Total             2097152    12440  2084712     1%
Aug  8 17:16:04 www syslogd: last message repeated 1 times
Aug  8 17:18:59 www su[724]: bob to root on /dev/pts/0
dT: 10.004s  w: 10.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps   ms/d   %busy Name
    0      4      0      0    0.0      4     74    5.2      0      0    0.0    1.1  mmcsd0
    0      5      1     22    4.2      5     78    6.1      0      0    0.0    1.6  da0
    0      4      0      0    0.0      4     74    5.3      0      0    0.0    1.2  mmcsd0s3
    0      0      0      0    0.0      0      1    4.8      0      0    0.0    0.1  mmcsd0s3a
    0      4      0      0    0.0      4     72    5.4      0      0    0.0    1.1  mmcsd0s3b
    0      4      0      0    0.0      4     71    6.2      0      0    0.0    1.2  da0b
    0      1      1     22    4.3      0      6    4.9      0      0    0.0    0.4  da0d
Wed Aug  8 21:25:42 PDT 2018
Device          1K-blocks     Used    Avail Capacity
/dev/da0b         1048576     6996  1041580     1%
/dev/mmcsd0s3b    1048576     6876  1041700     1%
Total             2097152    13872  2083280     1%
Aug  8 17:16:04 www syslogd: last message repeated 1 times
Aug  8 17:18:59 www su[724]: bob to root on /dev/pts/0
dT: 10.011s  w: 10.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps   ms/d   %busy Name
    0     50      0      0    0.0     50    740    4.1      0      0    0.0   13.0  mmcsd0
    0     53      0      0    0.0     53    762    7.4      0      0    0.0   15.1  da0
    0     50      0      0    0.0     50    740    4.2      0      0    0.0   13.2  mmcsd0s3
    0     50      0      0    0.0     50    740    4.2      0      0    0.0   13.3  mmcsd0s3b
    0     50      0      0    0.0     50    762    7.3      0      0    0.0   15.1  da0b
Wed Aug  8 21:25:52 PDT 2018
Device          1K-blocks     Used    Avail Capacity
/dev/da0b         1048576    15512  1033064     1%
/dev/mmcsd0s3b    1048576    14800  1033776     1%
Total             2097152    30312  2066840     1%
Aug  8 17:16:04 www syslogd: last message repeated 1 times
Aug  8 17:18:59 www su[724]: bob to root on /dev/pts/0
dT: 10.004s  w: 10.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps   ms/d   %busy Name
    0     35      0      0    0.0     35   1509   11.2      0      0    0.0   20.3  mmcsd0
    0     43      0      3   44.8     43   1346   50.1      0      0    0.0   55.9  da0
    0     35      0      0    0.0     35   1506   11.4      0      0    0.0   20.4  mmcsd0s3
    0     35      0      0    0.0     35   1503   11.4      0      0    0.0   20.4  mmcsd0s3b
    0      0      0      3   44.8      0      6    3.3      0      0    0.0    0.5  da0a
    0     35      0      0    0.0     35   1336   54.3      0      0    0.0   56.5  da0b
    0      0      0      0    0.0      0      3   1433      0      0    0.0   14.3  da0d
Wed Aug  8 21:26:03 PDT 2018
Device          1K-blocks     Used    Avail Capacity
/dev/da0b         1048576    29448  1019128     3%
/dev/mmcsd0s3b    1048576    30540  1018036     3%
Total             2097152    59988  2037164     3%
Aug  8 17:18:59 www su[724]: bob to root on /dev/pts/0
Aug  8 21:25:57 www kernel: waited 3s for async swap write
dT: 10.002s  w: 10.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps   ms/d   %busy Name
    0      1      0      1    3.8      1     77   17.5      0      0    0.0    1.1  mmcsd0
    0      2      0      7    3.5      2     84   14.9      0      0    0.0    1.2  da0
    0      0      0      1    3.9      0      3    7.4      0      0    0.0    0.1  mmcsd0s2
    0      1      0      0    0.0      1     74   18.4      0      0    0.0    1.0  mmcsd0s3
    0      0      0      1    4.0      0      3    7.5      0      0    0.0    0.1  ufs/rootfs
    0      1      0      0    0.0      1      9   13.5      0      0    0.0    0.8  mmcsd0s3a
    0      1      0      0    0.0      1     65   23.3      0      0    0.0    0.8  mmcsd0s3b
    0      1      0      1    2.4      1     77   21.6      0      0    0.0    0.8  da0b
    0      1      0      6    3.8      0      8    5.2      0      0    0.0    0.4  da0d



> Based on these latencies, I think the system is behaving more or less as
> expected from the VM's perspective.  I do think the default oom_seq value
> is too low and will get that addressed in 12.0.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)



More information about the freebsd-arm mailing list