[Bug 277389] Reproduceable low memory freeze on 14.0-RELEASE-p5

From: <bugzilla-noreply_at_freebsd.org>
Date: Tue, 12 Mar 2024 20:48:38 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277389

--- Comment #10 from Mark Millard <marklmi26-fbsd@yahoo.com> ---
What OOM console messages are being generated? The kernel has
multiple, distinct OOM messages. Which type(s) are you
getting? :

"failed to reclaim memory"
"a thread waited too long to allocate a page"
"out of swap space"
"unknown OOM reason %d"

Also, but only for boot verbose:

"proc %d (%s) failed to alloc page on fault, starting OOM\n"

(Note: "out of swap space" would better be described as:
swblk or swpctrie zone exhausted. Such can happen without
the swap space showing as being fully used.)

For "failed to reclaim memory": sysctl -TW vm.pageout_oom_seq=120
(or even larger) could be of use in delaying the OOM activity.
The default is 12. /boot/loader.conf would be a place for such
a tunable. For reference:

# sysctl -Td vm.pageout_oom_seq
vm.pageout_oom_seq: back-to-back calls to oom detector to start OOM



Another issue that can happen is user I/O related processes
ending up not being runnable beacuse of the associated kernel
stacks being put in the swap space, blocking the processes
from running until the kernel stacks are read back in. In
/etc/sysctl.conf I have:

# Together this pair avoids swapping out the process kernel stacks.
# This also avoids processes for interacting with the system from
# being hung-up by such.
vm.swap_enabled=0
vm.swap_idle_enabled=0

These are live settable via:
sysctl -W vm.swap_enabled=0
sysctl vm.swap_idle_enabled=0

(They are not tunable's, and so do not go in
/boot/loader.conf .)



For "a thread waited too long to allocate a page"
there are . . .


There also is control over the criteria for this but is
is more complicated. In /boot/loader.conf (I'm using
defaults):

#
# For plunty of swap/paging space (will not
# run out), avoid pageout delays leading to
# Out Of Memory killing of processes:
#vm.pfault_oom_attempts=-1
#
# For possibly insufficient swap/paging space
# (might run out), increase the pageout delay
# that leads to Out Of Memory killing of
# processes (showing defaults at the time):
#vm.pfault_oom_attempts= 3
#vm.pfault_oom_wait= 10
# (The multiplication is the total but there
# are other potential tradoffs in the factors
# multiplied, even for nearly the same total.)

If you can be sure of not running out of swap/paging
space, you might try vm.pfault_oom_attempts=-1 .
If you do run out of swap/paging space, it would
deadlock, as I understand. So, if you can tolerate
that the -1 might be an option even if you do run
out of swap/paging space.

I do not have specific suggestions for alternatives
to 3 and 10. It would be exploratory for me if I had
to try such.

For reference:

# sysctl -Td vm.pfault_oom_attempts vm.pfault_oom_wait
vm.pfault_oom_attempts: Number of page allocation attempts in page fault
handler before it triggers OOM handling
vm.pfault_oom_wait: Number of seconds to wait for free pages before retrying
the page fault handler

-- 
You are receiving this mail because:
You are on the CC list for the bug.