Re: Chasing OOM Issues - good sysctl metrics to use?

From: Pete Wright <pete_at_nomadlogic.org>
Date: Fri, 22 Apr 2022 23:46:20 UTC

On 4/22/22 13:39, tech-lists wrote:
> Hi,
>
> On Thu, Apr 21, 2022 at 07:16:42PM -0700, Pete Wright wrote:
>> hello -
>>
>> on my workstation running CURRENT (amd64/32g of ram) i've been running
>> into a scenario where after 4 or 5 days of daily use I get an OOM event
>> and both chromium and firefox are killed.  then in the next day or so
>> the system will become very unresponsive in the morning when i unlock my
>> screensaver in the morning forcing a manual power cycle.
>
> I have the following set in /etc/sysctl.conf on a stable/13 
> workstation. Am using zfs with 32GB RAM.
>
> vm.pageout_oom_seq=120
> vm.pfault_oom_attempts=-1
> vm.pageout_update_period=0
>
> Since setting these here, OOM is a rarity. I don't profess to exactly 
> know
> what they do in detail though. But my experience since these were set
> is hardly any OOM and big users of memory like firefox don't crash.

nice, i will give those a test next time i crash which will be by next 
thurs if the pattern continues.

looking at the sysctl descriptions:
vm.pageout_oom_seq: back-to-back calls to oom detector to start OOM
vm.pfault_oom_attempts: Number of page allocation attempts in page fault 
handler before it triggers OOM handling
vm.pageout_update_period: Maximum active LRU update period

i could certainly see how those could be helpful.  in an ideal world i'd 
find the root cause of the system lock-ups, but it would be nice to just 
move on from this :)

cheers,
-p

-- 
Pete Wright
pete@nomadlogic.org
@nomadlogicLA