Re: "failed to reclaim memory" with much free physmem

In reply to: Garrett Wollman : "RE: "failed to reclaim memory" with much free physmem"
Go to: [ bottom of page ] [ top of archives ] [ this month ]

From: Rick Macklem <rick.macklem_at_gmail.com>
Date: Fri, 12 Sep 2025 00:22:10 UTC

On Thu, Sep 11, 2025 at 10:58 AM Garrett Wollman <wollman@bimajority.org> wrote:
>
> <<On Tue, 9 Sep 2025 12:19:21 -0700, Mark Millard <marklmi@yahoo.com> said:
>
> > Garrett Wollman <wollman_at_bimajority.org> wrote on
> > Date: Tue, 09 Sep 2025 16:19:42 UTC :
>
> >> On some of our newer large-memory NFS servers, we are seeing services
> >> killed with "failed to reclaim memory". According to our monitoring,
> >> the server has >100G of physmem free at the time,
>
> > Was that 100G+ somewhat before any reclaiming of memory started,
> > the lead-up to the notice?
>
> That was within five minutes of munin-node getting shot by the OOM
> killer.  There was much less memory free ca. 24 hours before the
> event.
>
> > Any likelihood of sudden, rapid, huge drops in free RAM based on
> > workload behavior?
>
> I don't have access to client workloads, but it would have to be a bug
> in ZFS if so; these are file servers, all they run is NFS.
Bug or tuning weakness?
If you look at sys/contrib/openzfs/module/os/linux/zfs/arc_os.c, it does
a bunch of arm-waving setting arc_sys_free whereas
sys/contrib/openzfs/module/os/freebsd/zfs/arc_os.c doesn't do anything.
--> I'd try tuning it via vfs.zfs.arc.sys_free?
(The default is 0 and that says "use all of the memory" if I read it
correctly. I probably haven't read it correctly, which was why I suggested
you compare the two of them.)

rick

>
> > Is NUMA involved?
>
> Damn if I know.
>
> >> and the only
> >> solution seems to be rebooting. (There is a small amount of swap
> >> configured and even less of it in use.)
>
> > That swap is in use at all could be of interest. I wonder
> > whaat it was doing when the swap was put to use or laundry
> > was growing that lead to swap being put to use.
>
> It's pretty normal on these servers, which stay up for six months
> between OS upgrades, for some userland daemons to get swapped out,
> although I agree that it seems like it shouldn't happen given that the
> size of memory (1 TiB) is much greater than the size of running
> processes (< 1 GiB).
>
> My suspicion here is that there's some sort of accounting error, but I
> don't know where to look, and I only have data retrospectively, and
> only the data that munin is collecting.  (Someone else was on call
> when this happened most recently and they reported that their login
> shell kept on getting shot -- as was the getty on the serial console.)
>
> -GAWollman
>
>