Re: 13-STABLE high idprio load gives poor responsiveness and excessive CPU time per task

From: Mark Millard <marklmi_at_yahoo.com>
Date: Thu, 29 Feb 2024 17:40:39 UTC
On Feb 29, 2024, at 08:21, Peter <pmc@citylink.dinoex.sub.org> wrote:

> On Thu, Feb 29, 2024 at 08:02:42AM -0800, Mark Millard wrote:
> ! Peter 'PMc' Much <pmc_at_citylink.dinoex.sub.org>wrote on
> ! Date: Thu, 29 Feb 2024 13:40:05 UTC :
> ! 
> ! > There is an updated patch in the PR 275594 (5 pieces), that works for
> ! > 13.3; I have it installed, and only with that I am able to build gcc12
> ! > - otherwise the system would just OOM-crash (vm.pageout_oom_seq=5120
> ! > does not help with this).
> ! 
> ! The kernel has multiple, distinct OOM messages. Which type are you
> ! seeing? :
> ! 
> ! "a thread waited too long to allocate a page"
> 
> That one.

That explains why vm.pageout_oom_seq=5120 did not make a
notable difference in the time frame.

If you cause a verbose boot the code:

       if (bootverbose)
               printf(
           "proc %d (%s) failed to alloc page on fault, starting OOM\n",
                   curproc->p_pid, curproc->p_comm);

likely will report what process had failed to get a
page in a timely manor.

There also is control over the criteria for this but is
is more complicated. In /boot/loader.conf (I'm using
defaults):

#
# For plunty of swap/paging space (will not
# run out), avoid pageout delays leading to
# Out Of Memory killing of processes:
#vm.pfault_oom_attempts=-1
#
# For possibly insufficient swap/paging space
# (might run out), increase the pageout delay
# that leads to Out Of Memory killing of
# processes (showing defaults at the time):
#vm.pfault_oom_attempts= 3
#vm.pfault_oom_wait= 10
# (The multiplication is the total but there
# are other potential tradoffs in the factors
# multiplied, even for nearly the same total.)

If you can be sure of not running out of swap/paging
space, you might try vm.pfault_oom_attempts=-1 .
If you do run out of swap/paging space, it would
deadlock, as I understand. So, if you can tolerate
that the -1 might be an option even if you do run
out of swap/paging space.

I do not have specific suggestions for alternatives
to 3 and 10. It would be exploratory for me if I had
to try such.

For reference:

# sysctl -Td vm.pfault_oom_attempts vm.pfault_oom_wait
vm.pfault_oom_attempts: Number of page allocation attempts in page fault handler before it triggers OOM handling
vm.pfault_oom_wait: Number of seconds to wait for free pages before retrying the page fault handler


===
Mark Millard
marklmi at yahoo.com