From nobody Wed Oct 29 21:06:48 2025 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4cxfss2VJ3z6FK53 for ; Wed, 29 Oct 2025 21:05:53 +0000 (UTC) (envelope-from ambrisko@ambrisko.com) Received: from aws.ambrisko.com (aws.ambrisko.com [100.20.204.14]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "ambrisko.com", Issuer "R12" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4cxfsr3XSVz494L; Wed, 29 Oct 2025 21:05:52 +0000 (UTC) (envelope-from ambrisko@ambrisko.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=ambrisko.com header.s=default header.b=RkVIRiN1; dmarc=pass (policy=reject) header.from=ambrisko.com; spf=pass (mx1.freebsd.org: domain of ambrisko@ambrisko.com designates 100.20.204.14 as permitted sender) smtp.mailfrom=ambrisko@ambrisko.com Received: from ambrisko.com (localhost [127.0.0.1]) by aws.ambrisko.com (8.18.1/8.18.1) with ESMTPS id 59TL6mEa039941 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Wed, 29 Oct 2025 14:06:48 -0700 (PDT) (envelope-from ambrisko@ambrisko.com) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=ambrisko.com; s=default; t=1761772009; bh=Rp38fsDCHGCYjNXOyr6SfkKJ5MUy80TEFFrM5zy2WRI=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=RkVIRiN1+ji/NHE+vLwii/j2zohKIk6iAXnoZpPnUdyiOZ4wd95pIH0iQ7juf+1z8 +i+u2ClC7MG/pIE0sxRrs7GKsIaURkdBKj9BrQPSrFioY7LgiiVZc2xONPYuDLZUkP 1Q62zVYCXoG9apwLoNpUFFNhnezWILPL0E8ptsbw= X-Authentication-Warning: aws.ambrisko.com: Host localhost [127.0.0.1] claimed to be ambrisko.com Received: (from ambrisko@localhost) by ambrisko.com (8.18.1/8.18.1/Submit) id 59TL6m9f039940; Wed, 29 Oct 2025 14:06:48 -0700 (PDT) (envelope-from ambrisko) Date: Wed, 29 Oct 2025 14:06:48 -0700 From: Doug Ambrisko To: Peter Eriksson Cc: Rick Macklem , FreeBSD CURRENT , Garrett Wollman , Alexander Motin Subject: Re: RFC: How ZFS handles arc memory use Message-ID: References: List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spamd-Bar: -- X-Spamd-Result: default: False [-2.50 / 15.00]; SUSPICIOUS_RECIPS(1.50)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; DMARC_POLICY_ALLOW(-0.50)[ambrisko.com,reject]; R_SPF_ALLOW(-0.20)[+ip4:100.20.204.14]; R_DKIM_ALLOW(-0.20)[ambrisko.com:s=default]; MIME_GOOD(-0.10)[text/plain]; DKIM_TRACE(0.00)[ambrisko.com:+]; RCVD_TLS_LAST(0.00)[]; ARC_NA(0.00)[]; MISSING_XM_UA(0.00)[]; FREEMAIL_CC(0.00)[gmail.com,freebsd.org,bimajority.org]; TO_DN_ALL(0.00)[]; HAS_XAW(0.00)[]; ASN(0.00)[asn:16509, ipnet:100.20.0.0/14, country:US]; MIME_TRACE(0.00)[0:+]; TO_MATCH_ENVRCPT_SOME(0.00)[]; FROM_HAS_DN(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; FREEFALL_USER(0.00)[ambrisko]; RCVD_COUNT_TWO(0.00)[2]; MID_RHS_MATCH_FROM(0.00)[]; MLMMJ_DEST(0.00)[freebsd-current@freebsd.org]; TAGGED_RCPT(0.00)[]; RCPT_COUNT_FIVE(0.00)[5] X-Rspamd-Queue-Id: 4cxfsr3XSVz494L It seems around the switch to OpenZFS I would have arc clean task running 100% on a core. I use nullfs on my laptop to map my shared ZFS /data partiton into a few vnet instances. Over night or so I would get into this issue. I found that I had a bunch of vnodes being held by other layers. My solution was to reduce kern.maxvnodes and vfs.zfs.arc.max so the ARC cache stayed reasonable without killing other applications. That is why a while back I added the vnode count to mount -v so that I could see the usage of vnodes for each mount point. I made a script to report on things: #!/bin/sh ( sysctl kstat.zfs.misc.arcstats.arc_prune sysctl kstat.zfs.misc.arcstats.arc_raw_size sysctl kstat.zfs.misc.arcstats.c_max sysctl vfs.zfs.arc_max sysctl vfs.wantfreevnodes sysctl vfs.freevnodes sysctl vfs.vnodes_created sysctl vfs.numvnodes sysctl vfs.vnode.vnlru.kicks sysctl vfs.numvnodes ) | awk '{ printf "%40s %10d\n", $1, $2 }' mount -v | while read fs_type junk mount params do count=`echo "$params" | sed 's/^.*count //' | sed 's/ [\)]//'` #echo "$fs_type, $mount, $params" echo "$count, $fs_type, $mount" done | awk '{ printf "%10d %5s %s\n", $1, $2, $3 }' | sort -nr | head With a reduced vnode max and arc max my system stays reasonably responsive over weeks or longer assuming I can suspend and resume without a crash. Thanks, Doug A. On Wed, Oct 22, 2025 at 10:46:50PM +0200, Peter Eriksson wrote: | I too am seeing this issue with some of our FreeBSD 14.n running “backup” servers. However they are not NFS servers - they are rsync targets backing up the user-facing NFS(&SMB) servers. | | Every now and then (once a month) I see the load numbers spiking and disk I/O going thru the roof - and then performance grinds to a virtual halt (responds extremely slowly to shells/console I/O) for many hours, before it resolves itself (partly). A reboot fixes the issue better though, and then it runs fine for a couple of months before it happens again. | | Our user-facing (NFS & SMB-serving) servers are all still running FreeBSD 13.x though in order to avoid running into this issue - but eventually we will be forced to upgrade to 14 when 13 gets EOLd next year (so it would be nice if this issue could be fixed before that :-)... | | Our 14.3-running servers involved have each 512-768GB of RAM, many many (22k on one 111k on the other) ZFS filesystems and many files and directories (somewhere around 1G on one 2.5G on the other). | | I’ve seen some posts that this issue (or something similar) has been seen in the Linux world too and there is (I think) was related to inodes/vnodes directory caching causing pinned memory being allocated that couldn’t be released easily - and since my rsync backup servers will scan all files/directories when doing backups this sounds plausible… | | Here’s a graf showing some measurements when this happens - the grafs aren’t perfectly aligned but things starts at 22:00 when the backups start - when the load number starts to spike, and at the same time something causes the vfs_numvnodes counter to drop drastically. The ZFS memory usage doesn’t seem to be happening much at that time though… We measure a lot of different values so let me know if there’s some other number someone would like so see :-) | | | | - Peter | | > On 22 Oct 2025, at 16:34, Rick Macklem wrote: | > | > Hi, | > | > A couple of people have reported problems with NFS servers, | > where essentially all of the system's memory gets exhausted. | > They see the problem on 14.n FreeBSD servers (which use the | > newer ZFS code) but not on 13.n servers. | > | > I am trying to learn how ZFS handles arc memory use to try | > and figure out what can be done about this problem. | > | > I know nothing about ZFS internals or UMA(9) internals, | > so I could be way off, but here is what I think is happening. | > (Please correct me on this.) | > | > The L1ARC uses uma_zalloc_arg()/uma_zfree_arg() to allocate | > the arc memory. The zones are created using uma_zcreate(), | > so they are regular zones. This means the pages are coming | > from a slab in a keg, which are wired pages. | > | > The only time the size of the slab/keg will be reduced by ZFS | > is when it calls uma_zone_reclaim(.., UMA_RECLAIM_DRAIN), | > which is called by arc_reap_cb(), triggered by arc_reap_cb_check(). | > | > arc_reap_cb_check() uses arc_available_memory() and triggers | > arc_reap_cb() when arc_available_memory() returns a negative | > value. | > | > arc_available_memory() returns a negative value when | > zfs_arc_free_target (vfs.zfs.arc.free_target) is greater than freemem. | > (By default, zfs_arc_free_target is set to vm_cnt.v_free_taget.) | > | > Does all of the above sound about right? | > | > This leads me to... | > - zfs_arc_free_target (vfs.zfs.arc.free_target) needs to be larger | > or | > - Most of the wired pages in the slab are per-cpu, | > so the uma_zone_reclaim() needs to UMA_RECLAIM_DRAIN_CPU | > on some systems. (Not the small test systems I have, where I | > cannot reproduce the problem.) | > or | > - uma_zone_reclaim() needs to be called under other | > circumstances. | > or | > - ??? | > | > How can you tell if a keg/slab is per-cpu? | > (For my simple test system, I only see "UMA Slabs 0:" and | > "UMA Slabs 1:". It looks like UMA Slabs 0: is being used for | > ZFS arc allocation for this simple test system.) | > | > Hopefully folk who understand ZFS arc allocation or UMA | > can jump in and help out, rick |