Re: The pagedaemon evicts ARC before scanning the inactive page list
Date: Tue, 18 May 2021 22:00:14 UTC
On Tue, May 18, 2021 at 3:45 PM Mark Johnston <markj@freebsd.org> wrote:
> On Tue, May 18, 2021 at 03:07:44PM -0600, Alan Somers wrote:
> > I'm using ZFS on servers with tons of RAM and running FreeBSD
> > 12.2-RELEASE. Sometimes they get into a pathological situation where
> most
> > of that RAM sits unused. For example, right now one of them has:
> >
> > 2 GB Active
> > 529 GB Inactive
> > 16 GB Free
> > 99 GB ARC total
> > 469 GB ARC max
> > 86 GB ARC target
> >
> > When a server gets into this situation, it stays there for days, with the
> > ARC target barely budging. All that inactive memory never gets reclaimed
> > and put to a good use. Frequently the server never recovers until a
> reboot.
> >
> > I have a theory for what's going on. Ever since r334508^ the pagedaemon
> > sends the vm_lowmem event _before_ it scans the inactive page list. If
> the
> > ARC frees enough memory, then vm_pageout_scan_inactive won't need to free
> > any. Is that order really correct? For reference, here's the relevant
> > code, from vm_pageout_worker:
>
> That was the case even before r334508. Note that prior to that revision
> vm_pageout_scan_inactive() would trigger vm_lowmem if pass > 0, before
> scanning the inactive queue. During a memory shortage we have pass > 0.
> pass == 0 only when the page daemon is scanning the active queue.
>
> > shortage = pidctrl_daemon(&vmd->vmd_pid, vmd->vmd_free_count);
> > if (shortage > 0) {
> > ofree = vmd->vmd_free_count;
> > if (vm_pageout_lowmem() && vmd->vmd_free_count > ofree)
> > shortage -= min(vmd->vmd_free_count - ofree,
> > (u_int)shortage);
> > target_met = vm_pageout_scan_inactive(vmd, shortage,
> > &addl_shortage);
> > } else
> > addl_shortage = 0
> >
> > Raising vfs.zfs.arc_min seems to workaround the problem. But ideally
> that
> > wouldn't be necessary.
>
> vm_lowmem is too primitive: it doesn't tell subscribing subsystems
> anything about the magnitude of the shortage. At the same time, the VM
> doesn't know much about how much memory they are consuming. A better
> strategy, at least for the ARC, would be reclaim memory based on the
> relative memory consumption of each subsystem. In your case, when the
> page daemon goes to reclaim memory, it should use the inactive queue to
> make up ~85% of the shortfall and reclaim the rest from the ARC. Even
> better would be if the ARC could use the page cache as a second-level
> cache, like the buffer cache does.
>
> Today I believe the ARC treats vm_lowmem as a signal to shed some
> arbitrary fraction of evictable data. If the ARC is able to quickly
> answer the question, "how much memory can I release if asked?", then
> the page daemon could use that to determine how much of its reclamation
> target should come from the ARC vs. the page cache.
>
I guess I don't understand why you would ever free from the ARC rather than
from the inactive list. When is inactive memory ever useful?