Re: RFC: How ZFS handles arc memory use
- In reply to: Alexander Leidinger : "Re: RFC: How ZFS handles arc memory use"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 05 Mar 2026 23:36:20 UTC
On Wed, Mar 04, 2026 at 11:03:42AM +0100, Alexander Leidinger wrote: > Am 2026-03-03 23:45, schrieb Doug Ambrisko: > > On Tue, Mar 03, 2026 at 02:25:11PM -0800, Rick Macklem wrote: > > | On Tue, Mar 3, 2026 at 12:33 PM Doug Ambrisko <ambrisko@ambrisko.com> > > wrote: > > | > > > | > On Sun, Nov 02, 2025 at 11:48:06AM +0100, Alexander Leidinger wrote: > > | > | Am 2025-10-29 22:06, schrieb Doug Ambrisko: > > | > | > It seems around the switch to OpenZFS I would have arc clean > > task > > | > | > running > > | > | > 100% on a core. I use nullfs on my laptop to map my shared ZFS > > /data > > | > | > partiton into a few vnet instances. Over night or so I would > > get into > > | > | > this issue. I found that I had a bunch of vnodes being held by > > other > > | > | > layers. My solution was to reduce kern.maxvnodes and > > vfs.zfs.arc.max so > > | > | > the ARC cache stayed reasonable without killing other > > applications. > > | > | > > > | > | > That is why a while back I added the vnode count to mount -v so > > that > > | > | > I could see the usage of vnodes for each mount point. I made a > > script > > | > | > to report on things: > > | > | > > | > | Do you see this also with the nullfs mount option "nocache"? > > | > > > | > I seems to have run into this issue with nocache > > | > /data/jail/current/usr/local/etc/cups > > /data/jail/current-other/usr/local/etc/cups nullfs rw,nocache 0 0 > > | > /data/jail/current/usr/local/etc/sane.d > > /data/jail/current-other/usr/local/etc/sane.d nullfs rw,nocache 0 0 > > | > /data/jail/current/usr/local/www > > /data/jail/current-other/usr/local/www nullfs rw,nocache 0 0 > > | > /data/jail/current/usr/local/etc/nginx > > /data/jail/current-other/usr/local/etc/nginx nullfs rw,nocache 0 0 > > | > /data/jail/current/tftpboot > > /data/jail/current-other/tftpboot nullfs rw,nocache 0 0 > > | > /data/jail/current/usr/local/lib/grub > > /data/jail/current-other/usr/local/lib/grub nullfs rw,nocache 0 0 > > | > /data/jail > > /data/jail/current-other/data/jail nullfs rw,nocache 0 0 > > | > /data/jail > > /data/jail/current/data/jail nullfs rw,nocache 0 0 > > | > > > | > After a while (a couple of months or more). My laptop was running > > slow > > | > with a high load. The perodic find was running slow. arc_prunee > > was > > | > spinning. When I reduced the number of vnodes then things got > > better. > > | > My vfs.zfs.arc_max is 1073741824 so that I have memory for other > > things. > > | > > > | > nocache does help taking longer to get into this situation. > > | Have any of you guys tried increasing vfs.zfs.arc.free_target? > > | > > | If I understand the code correctly, when freemem < > > vfs.zfs.arc.free_target > > | the reaper thread (the one that does uma_zone_reclaim() to return > > pages > > | to the system from the uma keg that the arc uses) should be activated. > > > > I haven't tried that. I set: > > kern.maxvnodes > > vfs.zfs.arc.min > > vfs.zfs.arc.max > > vfs.zfs.prefetch.disable=1 > > > > I need to make sure kern.maxvnodes is small enough so it doesn't thrash > > when vfs.zfs.arc.max set to 1G. The issues tend to take a while to > > happen. On the plus side I can adjust these when I hit them mostly by > > reducing kern.maxvnodes without having to do a reboot. > > There was this commit recently_ > https://cgit.freebsd.org/src/commit/sys/fs/nullfs?id=8b64d46fab87af3ae062901312187f3a04ad2d67 > > I have not checked if this race condition could result in anything related > to what we see. From the commit message I can not deduct if this could for > example lead to a (even temporary) resource leak which may explain this > behavior. Mark, what is the high-level result of this race condition you > fixed in nullfs? At first look at the commit log I would rather assume > vnodes of the lower FS could rather be freed more early and not at all > because of the race condition. The high-level result would be a lock leak and presumably an eventual deadlock or crash. In an INVARIANTS kernel you'd get an assertion failure. I doubt the bug can be responsible for the issues reported in this thread.