RELENG_14 [process] was killed: failed to reclaim memory

From: Mark Millard <marklmi_at_yahoo.com>
Date: Wed, 15 Nov 2023 01:25:18 UTC
mike tancsa <mike_at_sentex.net> wrote on
Date: Tue, 14 Nov 2023 13:44:22 UTC :

> While testing some new hardware on a recent RELENG_14 image (from Nov 
> 10th), I noticed some of my ssh sessions would get killed off with the 
> errors below (twice in 24hrs)
> 
> pid 1697 (sshd), jid 0, uid 1001, was killed: failed to reclaim memory
> pid 6274 (sshd), jid 0, uid 1001, was killed: failed to reclaim memory
> . . .

[My notes below are not specific to releng/14.0 or to
stable/14 .]

What do you have for ( copied from my /boot/loader.conf ):

#
# Delay when persistent low free RAM leads to
# Out Of Memory killing of processes:
vm.pageout_oom_seq=120

The default is 12 (last I knew, anyway).

The 120 figure has allowed me and others to do buildworld,
buildkernel, and poudriere bulk runs on small arm boards
using all cores that otherwise got "failed to reclaim
memory" (to use the modern, improved [not misleading]
message text).

(The units for the 120 are not time units: more like a
number of (re)tries to gain at least a target amount of
Free RAM before failure handling starts. The comment
wording is based on a consequence of the assignment.)

The 120 is not a maximum, just a figure that has proved
useful in various contexts.


Notes:

"failed to reclaim memory" can happen even with swap
space enabled but no swap in use: sufficiently active
pages are just not paged out to swap space.

There are some other parameters of possible use for some
other modern "was killed" reason texts.

===
Mark Millard
marklmi at yahoo.com