Re: Chasing OOM Issues - good sysctl metrics to use?
Date: Wed, 11 May 2022 00:49:49 UTC
On 2022-May-10, at 11:49, Mark Millard <marklmi@yahoo.com> wrote:
> On 2022-May-10, at 08:47, Jan Mikkelsen <janm@transactionware.com> wrote:
>
>> On 10 May 2022, at 10:01, Mark Millard <marklmi@yahoo.com> wrote:
>>>
>>> On 2022-Apr-29, at 13:57, Mark Millard <marklmi@yahoo.com> wrote:
>>>
>>>> On 2022-Apr-29, at 13:41, Pete Wright <pete@nomadlogic.org> wrote:
>>>>>
>>>>>> . . .
>>>>>
>>>>> d'oh - went out for lunch and workstation locked up. i *knew* i shouldn't have said anything lol.
>>>>
>>>> Any interesting console messages ( or dmesg -a or /var/log/messages )?
>>>>
>>>
>>> I've been doing some testing of a patch by tijl at FreeBSD.org
>>> and have reproduced both hang-ups (ZFS/ARC context) and kills
>>> (UFS/noARC and ZFS/ARC) for "was killed: failed to reclaim
>>> memory", both with and without the patch. This is with only a
>>> tiny fraction of the swap partition(s) enabled being put to
>>> use. So far, the testing was deliberately with
>>> vm.pageout_oom_seq=12 (the default value). My testing has been
>>> with main [so: 14].
>>>
>>> But I also learned how to avoid the hang-ups that I got --but
>>> it costs making kills more likely/quicker, other things being
>>> equal.
>>>
>>> I discovered that the hang-ups that I got were from all the
>>> processes that I interact with the system via ending up with
>>> the process's kernel threads swapped out and were not being
>>> swapped in. (including sshd, so no new ssh connections). In
>>> some contexts I only had escaping into the kernel debugger
>>> available, not even ^T would work. Other times ^T did work.
>>>
>>> So, when I'm willing to risk kills in order to maintain
>>> the ability to interact normally, I now use in
>>> /etc/sysctl.conf :
>>>
>>> vm.swap_enabled=0
>>
>> I have been looking at an OOM related issue. Ignoring the actual leak, the problem leads to a process being killed because the system was out of memory. This is fine. After that, however, the system console was black with a single block cursor and the console keyboard was unresponsive. Caps lock and num lock didn’t toggle their lights when pressed.
>>
>> Using an ssh session, the system looked fine. USB events for the keyboard being disconnected and reconnected appeared but the keyboard stayed unresponsive.
>>
>> Setting vm.swap_enabled=0, as you did above, resolved this problem. After the process was killed a perfectly normal console returned.
>>
>> The interesting thing is that this test system is configured with no swap space.
>>
>> This is on 13.1-RC5.
>>
>>> This disables swapping out of process kernel stacks. It
>>> is just with that option removedfor gaining free RAM, there
>>> fewer options tried before a kill is initiated. It is not a
>>> loader-time tunable but is writable, thus the
>>> /etc/sysctl.conf placement.
>>
>> Is that really what it does? From a quick look at the code in vm/vm_swapout.c, it seems little more complex.
>
> I was going by its description:
>
> # sysctl -d vm.swap_enabled
> vm.swap_enabled: Enable entire process swapout
>
> Based on the below, it appears that the description
> presumes vm.swap_idle_enabled==0 (the default). In
> my context vm.swap_idle_enabled==0 . Looks like I
> should also list:
>
> vm.swap_idle_enabled=0
>
> in my /etc/sysctl.conf with a reminder comment that the
> pair of =0's are required for avoiding the observed
> hang-ups.
>
>
> The analysis goes like . . .
>
> I see in the code that vm.swap_enabled !=0 causes
> VM_SWAP_NORMAL :
>
> void
> vm_swapout_run(void)
> {
>
> if (vm_swap_enabled)
> vm_req_vmdaemon(VM_SWAP_NORMAL);
> }
>
> and that in turn leads to vm_daemon to:
>
> if (swapout_flags != 0) {
> /*
> * Drain the per-CPU page queue batches as a deadlock
> * avoidance measure.
> */
> if ((swapout_flags & VM_SWAP_NORMAL) != 0)
> vm_page_pqbatch_drain();
> swapout_procs(swapout_flags);
> }
>
> Note: vm.swap_idle_enabled==0 && vm.swap_enabled==0 ends
> up with swapout_flags==0. vm.swap_idle. . . defaults seem
> to be (in my context):
>
> # sysctl -a | grep swap_idle
> vm.swap_idle_threshold2: 10
> vm.swap_idle_threshold1: 2
> vm.swap_idle_enabled: 0
>
> For reference:
>
> /*
> * Idle process swapout -- run once per second when pagedaemons are
> * reclaiming pages.
> */
> void
> vm_swapout_run_idle(void)
> {
> static long lsec;
>
> if (!vm_swap_idle_enabled || time_second == lsec)
> return;
> vm_req_vmdaemon(VM_SWAP_IDLE);
> lsec = time_second;
> }
>
> [So vm.swap_idle_enabled==0 avoids VM_SWAP_IDLE status.]
>
> static void
> vm_req_vmdaemon(int req)
> {
> static int lastrun = 0;
>
> mtx_lock(&vm_daemon_mtx);
> vm_pageout_req_swapout |= req;
> if ((ticks > (lastrun + hz)) || (ticks < lastrun)) {
> wakeup(&vm_daemon_needed);
> lastrun = ticks;
> }
> mtx_unlock(&vm_daemon_mtx);
> }
>
> [So VM_SWAP_IDLE and VM_SWAP_NORMAL are independent bits
> in vm_pageout_req_swapout.]
>
> vm_deamon does:
>
> mtx_lock(&vm_daemon_mtx);
> msleep(&vm_daemon_needed, &vm_daemon_mtx, PPAUSE, "psleep",
> vm_daemon_timeout);
> swapout_flags = vm_pageout_req_swapout;
> vm_pageout_req_swapout = 0;
> mtx_unlock(&vm_daemon_mtx);
>
> So vm_pageout_req_swapout is regenerated after thata
> each time.
>
> I'll not show the code for vm.swap_idle_enabled!=0 .
>
Well, with continued experiments I got an example of
a hangup for which looking via the db> prompt did not
show any swapping out of process kernel stacks
( vm.swap_enabled=0 was the context, so expected ).
The environment was ZFS (so with ARC).
But this was testing with vm.pageout_oom_seq=120 instead
of the default vm.pageout_oom_seq=12 . It may be that
let sit long enough things would have unhung (external
perspective).
It is part of what I'm experimenting with so we will see.
===
Mark Millard
marklmi at yahoo.com