"failed to reclaim memory" with much free physmem

From: Garrett Wollman <wollman_at_bimajority.org>
Date: Tue, 09 Sep 2025 16:19:42 UTC
On some of our newer large-memory NFS servers, we are seeing services
killed with "failed to reclaim memory".  According to our monitoring,
the server has >100G of physmem free at the time, and the only
solution seems to be rebooting.  (There is a small amount of swap
configured and even less of it in use.)  Does this sound familiar to
anyone?  What should we be monitoring that we evidently aren't now?

-GAWollman