Re: RELENG_14 [process] was killed: failed to reclaim memory

From: Mark Millard <marklmi_at_yahoo.com>
Date: Wed, 15 Nov 2023 18:39:50 UTC
On Nov 15, 2023, at 08:58, mike tancsa <mike@sentex.net> wrote:

> On 11/14/2023 8:25 PM, Mark Millard wrote:
>> mike tancsa <mike_at_sentex.net> wrote on
>> Date: Tue, 14 Nov 2023 13:44:22 UTC :
>> 
>>> While testing some new hardware on a recent RELENG_14 image (from Nov
>>> 10th), I noticed some of my ssh sessions would get killed off with the
>>> errors below (twice in 24hrs)
>>> 
>>> pid 1697 (sshd), jid 0, uid 1001, was killed: failed to reclaim memory
>>> pid 6274 (sshd), jid 0, uid 1001, was killed: failed to reclaim memory
>>> . . .
>> [My notes below are not specific to releng/14.0 or to
>> stable/14 .]
>> 
>> What do you have for ( copied from my /boot/loader.conf ):
> 
> Thanks Mark, no tuning in there other than forcing a particular driver to attach
> 
> # cat /boot/loader.conf
> kern.geom.label.disk_ident.enable="0"
> kern.geom.label.gptid.enable="0"
> cryptodev_load="YES"
> zfs_load="YES"
> hw.mfi.mrsas_enable=1
> t5fw_cfg_load="YES"
> if_cxgbe_load="YES"
> #
> 
> 
> 
>> #
>> # Delay when persistent low free RAM leads to
>> # Out Of Memory killing of processes:
>> vm.pageout_oom_seq=120
>> 
>> The default is 12 (last I knew, anyway).
> 
> Any thoughts for a machine with a lot of RAM, Am I better to limit ARC or change the default to 120 ?
> 

I have vm.pageout_oom_seq=120 everywhere, from the little arm's to the ThreadRipper
1950X with 128 GiBytes of RAM. I've hit the kills in all the contexts, even UFS
based on the 1950X (no ARC competing for RAM). (High load average style of bulk -a
test run, using USE_TMPFS=data and using even USE_TMPFS=no .)

(My bulk -a testing is rare.)

===
Mark Millard
marklmi at yahoo.com