Re: RFC: How ZFS handles arc memory use

From: Rick Macklem <rick.macklem_at_gmail.com>
Date: Wed, 22 Oct 2025 15:21:42 UTC
On Wed, Oct 22, 2025 at 8:05 AM Alexander Motin <mav@freebsd.org> wrote:
>
> Hi Rick,
>
> On 22.10.2025 10:34, Rick Macklem wrote:
> > A couple of people have reported problems with NFS servers,
> > where essentially all of the system's memory gets exhausted.
> > They see the problem on 14.n FreeBSD servers (which use the
> > newer ZFS code) but not on 13.n servers.
> >
> > I am trying to learn how ZFS handles arc memory use to try
> > and figure out what can be done about this problem.
> >
> > I know nothing about ZFS internals or UMA(9) internals,
> > so I could be way off, but here is what I think is happening.
> > (Please correct me on this.)
> >
> > The L1ARC uses uma_zalloc_arg()/uma_zfree_arg() to allocate
> > the arc memory. The zones are created using uma_zcreate(),
> > so they are regular zones. This means the pages are coming
> > from a slab in a keg, which are wired pages.
> >
> > The only time the size of the slab/keg will be reduced by ZFS
> > is when it calls uma_zone_reclaim(.., UMA_RECLAIM_DRAIN),
> > which is called by arc_reap_cb(), triggered by arc_reap_cb_check().
> >
> > arc_reap_cb_check() uses arc_available_memory() and triggers
> > arc_reap_cb() when arc_available_memory() returns a negative
> > value.
> >
> > arc_available_memory() returns a negative value when
> > zfs_arc_free_target (vfs.zfs.arc.free_target) is greater than freemem.
> > (By default, zfs_arc_free_target is set to vm_cnt.v_free_taget.)
> >
> > Does all of the above sound about right?
>
> There are two mechanisms to reduce ARC size: either from ZFS side in the
> way you described, or from kernel side, when it calls ZFS low memory
> handler arc_lowmem().  It feels somewhat overkill, but it came this way
> from Solaris.
>
> Once ARC size is reduced and evictions into UMA caches happened, it is
> up to UMA how to drain its caches.  ZFS might trigger that itself, or it
> can be done by kernel, or few years back I've added a mechanism for UMA
> caches to slowly shrink by themselves even without pressure.
>
> > This leads me to...
> > - zfs_arc_free_target (vfs.zfs.arc.free_target) needs to be larger
>
> There is a very delicate balance between ZFS and kernel
> (zfs_arc_free_target = vm_cnt.v_free_target).  Imbalance there makes one
> of them suffer.
>
> > or
> > - Most of the wired pages in the slab are per-cpu,
> >    so the uma_zone_reclaim() needs to UMA_RECLAIM_DRAIN_CPU
> >    on some systems. (Not the small test systems I have, where I
> >    cannot reproduce the problem.)
>
> Per-CPU caches should be relatively small. IIRC in dozens or hundreds of
> allocations per CPU.  Their drain is expensive and should rarely be
> needed, unless you have too little RAM for the number of CPUs you have.
>
> > or
> > - uma_zone_reclaim() needs to be called under other
> >    circumstances.
> > or
> > - ???
> >
> > How can you tell if a keg/slab is per-cpu?
> > (For my simple test system, I only see "UMA Slabs 0:" and
> > "UMA Slabs 1:". It looks like UMA Slabs 0: is being used for
> > ZFS arc allocation for this simple test system.)
> >
> > Hopefully folk who understand ZFS arc allocation or UMA
> > can jump in and help out, rick
>
> Before you dive into UMA, have you checked whether ARC size really
> shrinks and eviction happens?  Considering you mention NFS, I wonder
> what is your number of open files?  Too many open files might in some
> cases restrict ZFS ability to evict metadata from ARC.  arc_summary may
> give some insights about ARC state.
I don't know if this helps, but the original post is here:
https://lists.freebsd.org/archives/freebsd-stable/2025-September/003126.html
Then you'll find the email thread that follows it here:
https://lists.freebsd.org/archives/freebsd-stable/2025-September/003145.html

Hopefully Garrett can respond with more information, rick

>
> --
> Alexander Motin