Belated out of swap kill on rpi3 at r359216
Mark Millard
marklmi at yahoo.com
Tue Mar 24 20:22:08 UTC 2020
On 2020-Mar-24, at 11:55, bob prohaska <fbsd at www.zefox.net> wrote:
> On Tue, Mar 24, 2020 at 10:09:28AM -0700, Mark Millard wrote:
>>
>>
>> On 2020-Mar-24, at 08:57, bob prohaska <fbsd at www.zefox.net> wrote:
>>
>>> An attempt to buildworld on an rpi3 running r359216 stopped with an
>>> OOMA kill. Sources were at 359264, loader.conf contained
>>> vm.pageout_oom_seq="4096" .
>>
>> What of the value of vm.pfault_oom_attempts ?
>>
>> If vm.pfault_oom_attempts was not -1,
>> what of the value of vm.pfault_oom_wait ?
>>
> bob at www:/usr/src % sysctl vm.pfault_oom_wait
> vm.pfault_oom_wait: 10
> I didn't change it, this must be the default.
> Would it be useful to add something like
> vm.pfault_oom_wait="20"
> to /boot/loader.conf?
I assume that this answer means vm.pfault_oom_attempts
was not -1. But you did not list what it was.
vm.pfault_oom_attempts=-1 is the means of avoiding
the the vm.pfault_oom_attempts,vm.pfault_oom_wait
pair from causing OOM at all. (vm.pfault_oom_wait
becomes irrelevant.)
When vm.pfault_oom_attempts != -1 , then there are
potential time-outs that overall involve:
vm.pfault_oom_attempts * vm.pfault_oom_wait
but has some tradeoffs in the partitioning between
the 2 factors:
# sysctl -d vm.pfault_oom_attempts
vm.pfault_oom_attempts: Number of page allocation attempts in page fault handler before it triggers OOM handling
# sysctl -d vm.pfault_oom_wait
vm.pfault_oom_wait: Number of seconds to wait for free pages before retrying the page fault handler
As I understand, the following cases could each have
very different results depending on what the latencies
are like and such:
vm.pfault_oom_attempts==20 && vm.pfault_oom_wait==1
vs.
vm.pfault_oom_attempts==1 && vm.pfault_oom_wait==20
vs.
vm.pfault_oom_attempts==4 && vm.pfault_oom_wait==5
As I remember, Konstantin Belousov once asked someone
that was having a repeatable problem to try some
alternatives that explored this but, as I remember,
he got no reply to the request.
The person might have just used: vm.pfault_oom_attempts=-1
and continued with their primary activity, for all I know.
vm.pfault_oom_attempts=-1 is only recommended when one
is sure that they will not run out of swap/paging space,
if I understand right.
For reference, for other settings:
# sysctl -d vm.pageout_oom_seq
vm.pageout_oom_seq: back-to-back calls to oom detector to start OOM
There is also:
# sysctl -d vm.oom_pf_secs
vm.oom_pf_secs:
but it seems to have an empty description. May be
it reports vm.pfault_oom_attempts * vm.pfault_oom_wait
when vm.pfault_oom_attempts != -1 . (I've not looked.)
>>> A snipped of gstat log suggests the worst congestion in the storage I/O
>>> happened at Tue Mar 24 04:52:26 PDT 2020 with an L(q) of 37, but the
>>> OOMA kill happened at Tue Mar 24 04:53:04 PDT 2020, by which time the
>>> L(q) had dropped to one, half a minute later.
>>>
>>> Is the delay in OOMA action to be expected?
>>>
>>> Here's the relevant part of the log, I hope the columns display readably:
>>>
>>> 0/2/2/19177 mbuf clusters in use (current/cache/total/max)
>>> procs memory page disks faults cpu
>>> r b w avm fre flt re pi po fr sr mm0 da0 in sy cs us sy id
>>> 0 0 29 1897320 47312 12851 9 4 5 13330 1669 0 0 14172 8020 3034 65 29 6
>>> dT: 1.056s w: 1.000s
>>> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps ms/d %busy Name
>>> 37 367 323 5463 6.4 45 1511 76.8 0 0 0.0 91.7 mmcsd0
>>> 37 367 323 5463 6.5 45 1511 76.9 0 0 0.0 93.3 mmcsd0s2
>>> 34 133 111 3209 7.6 22 697 134.7 0 0 0.0 74.0 mmcsd0s2a
>>> 3 235 212 2254 5.9 23 814 21.5 0 0 0.0 70.0 mmcsd0s2b
>>> 34 133 111 3209 7.6 22 697 134.7 0 0 0.0 74.2 ufs/rootfs
>>> Tue Mar 24 04:52:26 PDT 2020
>>> Device 1K-blocks Used Avail Capacity
>>> /dev/mmcsd0s2b 4404252 224484 4179768 5%
>>> Mar 24 04:20:50 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 21842, size: 12288
>>> Mar 24 04:20:50 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 25477, size: 4096
>>
>> How bad were things back when the "indefinate wait buffer" notices
>> were generated (Mar 24 04:20:50 time frame)?
>>
> It looks like _new_ indefinite wait messages started coming at around Mon Mar 23 23:00:28 PDT 2020:
> procs memory page disks faults cpu
> r b w avm fre flt re pi po fr sr mm0 da0 in sy cs us sy id
> 2 0 18 1588824 197676 14108 2 1 0 14759 238 0 0 14055 9263 2668 62 32 5
> dT: 1.027s w: 1.000s
> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps ms/d %busy Name
> 9 1 0 0 0.0 1 31 13451 0 0 0.0 326.9 mmcsd0
> 9 1 0 0 0.0 1 31 13451 0 0 0.0 326.9 mmcsd0s2
> 6 1 0 0 0.0 1 31 13451 0 0 0.0 326.9 mmcsd0s2a
> 6 1 0 0 0.0 1 31 13451 0 0 0.0 326.9 ufs/rootfs
> Mon Mar 23 23:00:28 PDT 2020
> Device 1K-blocks Used Avail Capacity
> /dev/mmcsd0s2b 4404252 245288 4158964 6%
> Mar 23 23:00:28 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 280324, size: 8192
> Mar 23 23:00:28 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 285903, size: 16384
>
> Admittedly, 300% busy looks pretty bad. Still the machine didn't quit.....
> On the next sample %busy went down to 18% for swap, less for all else.
As I understand, the ms/w value 13451 means that there was
for a time (a mean of ?) somewhat under 14 seconds from a
write being queued to it being completed.
If a read could be blocked for such time frames by such
writes, it suggests that vm.pfault_oom_wait might need to
be larger than 13 if vm.pfault_oom_attempts=-1 is not in
use.
Or vm.pfault_oom_attempts=-1 could be used to avoid large
time frames from directly leading to OOM activity.
>>
>> Below I show code changes to be more explicit in the
>> output messaging about what contributes to initiating
>> OOM kills, without needing boot verbose or the like.
>> There are also some messages from Mark J.'s old code
>> for reporting on things related to initiating OOM
>> kills, or some minor variations of them.
>>
>
>> You may want to try such changes to provide more
>> context for your OOM kills when they happen. Below
>> the 4 reasons reported are (showing the most
>> detailed of the related messages from different
>> stages):
>>
>> swp_pager_meta_build: swap blk uma zone exhausted
>>
>> swp_pager_meta_build: swap pctrie uma zone exhausted
>>
>> vm_fault_allocate: proc %d (%s) failed to alloc page on fault, starting OOM
>>
>> m_pageout_mightbe_oom: kill context: v_free_count: %u, v_inactive_count: %u
>>
>>
>
> Could inquiries instead be added to the logging script?
> Right now it's
> #!/bin/sh
> while true
> do vmstat ; gstat -abd -I 1s ; date ; swapinfo ; tail -n 2 /var/log/messages ; netstat -m | grep "mbuf clusters"
> done
>
> I'd much rather tamper with the logging script than the kernel 8-)
Unfortunately, the reason I made the kernel messaging
changes is that, as far as I know, the normal kernel
simply does not publish the information anywhere when
the internal criteria leads to OOM activity: No
inquiry available without kernel changes.
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
More information about the freebsd-arm
mailing list