Re: 13-STABLE high idprio load gives poor responsiveness and excessive CPU time per task

In reply to: Mark Millard : "Re: 13-STABLE high idprio load gives poor responsiveness and excessive CPU time per task"
Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: Mark Millard <marklmi_at_yahoo.com>
Date: Thu, 29 Feb 2024 18:18:28 UTC
On Feb 29, 2024, at 09:40, Mark Millard <marklmi@yahoo.com> wrote:

> On Feb 29, 2024, at 08:21, Peter <pmc@citylink.dinoex.sub.org> wrote:
> 
>> On Thu, Feb 29, 2024 at 08:02:42AM -0800, Mark Millard wrote:
>> ! Peter 'PMc' Much <pmc_at_citylink.dinoex.sub.org>wrote on
>> ! Date: Thu, 29 Feb 2024 13:40:05 UTC :
>> ! 
>> ! > There is an updated patch in the PR 275594 (5 pieces), that works for
>> ! > 13.3; I have it installed, and only with that I am able to build gcc12
>> ! > - otherwise the system would just OOM-crash (vm.pageout_oom_seq=5120
>> ! > does not help with this).
>> ! 
>> ! The kernel has multiple, distinct OOM messages. Which type are you
>> ! seeing? :
>> ! 
>> ! "a thread waited too long to allocate a page"
>> 
>> That one.
> 
> That explains why vm.pageout_oom_seq=5120 did not make a
> notable difference in the time frame.
> 
> If you cause a verbose boot the code:
> 
>       if (bootverbose)
>               printf(
>           "proc %d (%s) failed to alloc page on fault, starting OOM\n",
>                   curproc->p_pid, curproc->p_comm);
> 
> likely will report what process had failed to get a
> page in a timely manor.
> 
> There also is control over the criteria for this but is
> is more complicated. In /boot/loader.conf (I'm using
> defaults):
> 
> #
> # For plunty of swap/paging space (will not
> # run out), avoid pageout delays leading to
> # Out Of Memory killing of processes:
> #vm.pfault_oom_attempts=-1
> #
> # For possibly insufficient swap/paging space
> # (might run out), increase the pageout delay
> # that leads to Out Of Memory killing of
> # processes (showing defaults at the time):
> #vm.pfault_oom_attempts= 3
> #vm.pfault_oom_wait= 10
> # (The multiplication is the total but there
> # are other potential tradoffs in the factors
> # multiplied, even for nearly the same total.)
> 
> If you can be sure of not running out of swap/paging
> space, you might try vm.pfault_oom_attempts=-1 .
> If you do run out of swap/paging space, it would
> deadlock, as I understand. So, if you can tolerate
> that the -1 might be an option even if you do run
> out of swap/paging space.
> 
> I do not have specific suggestions for alternatives
> to 3 and 10. It would be exploratory for me if I had
> to try such.
> 
> For reference:
> 
> # sysctl -Td vm.pfault_oom_attempts vm.pfault_oom_wait
> vm.pfault_oom_attempts: Number of page allocation attempts in page fault handler before it triggers OOM handling
> vm.pfault_oom_wait: Number of seconds to wait for free pages before retrying the page fault handler


I'll note that vm.pageout_oom_seq , vm.pfault_oom_attempts , and
vm.pfault_oom_wait are all live writable, not just boot-time
tunables. In other words, all show a line of output in:

# sysctl -Wd vm.pageout_oom_seq vm.pfault_oom_attempts vm.pfault_oom_wait
vm.pageout_oom_seq: back-to-back calls to oom detector to start OOM
vm.pfault_oom_attempts: Number of page allocation attempts in page fault handler before it triggers OOM handling
vm.pfault_oom_wait: Number of seconds to wait for free pages before retrying the page fault handler

Not just in:

# sysctl -Td vm.pageout_oom_seq vm.pfault_oom_attempts vm.pfault_oom_wait
vm.pageout_oom_seq: back-to-back calls to oom detector to start OOM
vm.pfault_oom_attempts: Number of page allocation attempts in page fault handler before it triggers OOM handling
vm.pfault_oom_wait: Number of seconds to wait for free pages before retrying the page fault handler

(To see values, to not use the "d".)


===
Mark Millard
marklmi at yahoo.com