Re: RFC: How ZFS handles arc memory use
- Reply: Alexander Leidinger : "Re: RFC: How ZFS handles arc memory use"
- In reply to: Peter Eriksson : "Re: RFC: How ZFS handles arc memory use"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 29 Oct 2025 21:06:48 UTC
It seems around the switch to OpenZFS I would have arc clean task running
100% on a core. I use nullfs on my laptop to map my shared ZFS /data
partiton into a few vnet instances. Over night or so I would get into
this issue. I found that I had a bunch of vnodes being held by other
layers. My solution was to reduce kern.maxvnodes and vfs.zfs.arc.max so
the ARC cache stayed reasonable without killing other applications.
That is why a while back I added the vnode count to mount -v so that
I could see the usage of vnodes for each mount point. I made a script
to report on things:
#!/bin/sh
(
sysctl kstat.zfs.misc.arcstats.arc_prune
sysctl kstat.zfs.misc.arcstats.arc_raw_size
sysctl kstat.zfs.misc.arcstats.c_max
sysctl vfs.zfs.arc_max
sysctl vfs.wantfreevnodes
sysctl vfs.freevnodes
sysctl vfs.vnodes_created
sysctl vfs.numvnodes
sysctl vfs.vnode.vnlru.kicks
sysctl vfs.numvnodes
) | awk '{ printf "%40s %10d\n", $1, $2 }'
mount -v | while read fs_type junk mount params
do
count=`echo "$params" | sed 's/^.*count //' | sed 's/ [\)]//'`
#echo "$fs_type, $mount, $params"
echo "$count, $fs_type, $mount"
done | awk '{ printf "%10d %5s %s\n", $1, $2, $3 }' | sort -nr | head
With a reduced vnode max and arc max my system stays reasonably responsive
over weeks or longer assuming I can suspend and resume without a crash.
Thanks,
Doug A.
On Wed, Oct 22, 2025 at 10:46:50PM +0200, Peter Eriksson wrote:
| I too am seeing this issue with some of our FreeBSD 14.n running “backup” servers. However they are not NFS servers - they are rsync targets backing up the user-facing NFS(&SMB) servers.
|
| Every now and then (once a month) I see the load numbers spiking and disk I/O going thru the roof - and then performance grinds to a virtual halt (responds extremely slowly to shells/console I/O) for many hours, before it resolves itself (partly). A reboot fixes the issue better though, and then it runs fine for a couple of months before it happens again.
|
| Our user-facing (NFS & SMB-serving) servers are all still running FreeBSD 13.x though in order to avoid running into this issue - but eventually we will be forced to upgrade to 14 when 13 gets EOLd next year (so it would be nice if this issue could be fixed before that :-)...
|
| Our 14.3-running servers involved have each 512-768GB of RAM, many many (22k on one 111k on the other) ZFS filesystems and many files and directories (somewhere around 1G on one 2.5G on the other).
|
| I’ve seen some posts that this issue (or something similar) has been seen in the Linux world too and there is (I think) was related to inodes/vnodes directory caching causing pinned memory being allocated that couldn’t be released easily - and since my rsync backup servers will scan all files/directories when doing backups this sounds plausible…
|
| Here’s a graf showing some measurements when this happens - the grafs aren’t perfectly aligned but things starts at 22:00 when the backups start - when the load number starts to spike, and at the same time something causes the vfs_numvnodes counter to drop drastically. The ZFS memory usage doesn’t seem to be happening much at that time though… We measure a lot of different values so let me know if there’s some other number someone would like so see :-)
|
|
|
| - Peter
|
| > On 22 Oct 2025, at 16:34, Rick Macklem <rick.macklem@gmail.com> wrote:
| >
| > Hi,
| >
| > A couple of people have reported problems with NFS servers,
| > where essentially all of the system's memory gets exhausted.
| > They see the problem on 14.n FreeBSD servers (which use the
| > newer ZFS code) but not on 13.n servers.
| >
| > I am trying to learn how ZFS handles arc memory use to try
| > and figure out what can be done about this problem.
| >
| > I know nothing about ZFS internals or UMA(9) internals,
| > so I could be way off, but here is what I think is happening.
| > (Please correct me on this.)
| >
| > The L1ARC uses uma_zalloc_arg()/uma_zfree_arg() to allocate
| > the arc memory. The zones are created using uma_zcreate(),
| > so they are regular zones. This means the pages are coming
| > from a slab in a keg, which are wired pages.
| >
| > The only time the size of the slab/keg will be reduced by ZFS
| > is when it calls uma_zone_reclaim(.., UMA_RECLAIM_DRAIN),
| > which is called by arc_reap_cb(), triggered by arc_reap_cb_check().
| >
| > arc_reap_cb_check() uses arc_available_memory() and triggers
| > arc_reap_cb() when arc_available_memory() returns a negative
| > value.
| >
| > arc_available_memory() returns a negative value when
| > zfs_arc_free_target (vfs.zfs.arc.free_target) is greater than freemem.
| > (By default, zfs_arc_free_target is set to vm_cnt.v_free_taget.)
| >
| > Does all of the above sound about right?
| >
| > This leads me to...
| > - zfs_arc_free_target (vfs.zfs.arc.free_target) needs to be larger
| > or
| > - Most of the wired pages in the slab are per-cpu,
| > so the uma_zone_reclaim() needs to UMA_RECLAIM_DRAIN_CPU
| > on some systems. (Not the small test systems I have, where I
| > cannot reproduce the problem.)
| > or
| > - uma_zone_reclaim() needs to be called under other
| > circumstances.
| > or
| > - ???
| >
| > How can you tell if a keg/slab is per-cpu?
| > (For my simple test system, I only see "UMA Slabs 0:" and
| > "UMA Slabs 1:". It looks like UMA Slabs 0: is being used for
| > ZFS arc allocation for this simple test system.)
| >
| > Hopefully folk who understand ZFS arc allocation or UMA
| > can jump in and help out, rick
|