From nobody Wed Oct 29 21:06:48 2025
X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4cxfss2VJ3z6FK53
	for <freebsd-current@mlmmj.nyi.freebsd.org>; Wed, 29 Oct 2025 21:05:53 +0000 (UTC)
	(envelope-from ambrisko@ambrisko.com)
Received: from aws.ambrisko.com (aws.ambrisko.com [100.20.204.14])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256
	 client-signature RSA-PSS (4096 bits) client-digest SHA256)
	(Client CN "ambrisko.com", Issuer "R12" (verified OK))
	by mx1.freebsd.org (Postfix) with ESMTPS id 4cxfsr3XSVz494L;
	Wed, 29 Oct 2025 21:05:52 +0000 (UTC)
	(envelope-from ambrisko@ambrisko.com)
Authentication-Results: mx1.freebsd.org;
	dkim=pass header.d=ambrisko.com header.s=default header.b=RkVIRiN1;
	dmarc=pass (policy=reject) header.from=ambrisko.com;
	spf=pass (mx1.freebsd.org: domain of ambrisko@ambrisko.com designates 100.20.204.14 as permitted sender) smtp.mailfrom=ambrisko@ambrisko.com
Received: from ambrisko.com (localhost [127.0.0.1])
	by aws.ambrisko.com (8.18.1/8.18.1) with ESMTPS id 59TL6mEa039941
	(version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO);
	Wed, 29 Oct 2025 14:06:48 -0700 (PDT)
	(envelope-from ambrisko@ambrisko.com)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=ambrisko.com;
	s=default; t=1761772009;
	bh=Rp38fsDCHGCYjNXOyr6SfkKJ5MUy80TEFFrM5zy2WRI=;
	h=Date:From:To:Cc:Subject:References:In-Reply-To;
	b=RkVIRiN1+ji/NHE+vLwii/j2zohKIk6iAXnoZpPnUdyiOZ4wd95pIH0iQ7juf+1z8
	 +i+u2ClC7MG/pIE0sxRrs7GKsIaURkdBKj9BrQPSrFioY7LgiiVZc2xONPYuDLZUkP
	 1Q62zVYCXoG9apwLoNpUFFNhnezWILPL0E8ptsbw=
X-Authentication-Warning: aws.ambrisko.com: Host localhost [127.0.0.1] claimed to be ambrisko.com
Received: (from ambrisko@localhost)
	by ambrisko.com (8.18.1/8.18.1/Submit) id 59TL6m9f039940;
	Wed, 29 Oct 2025 14:06:48 -0700 (PDT)
	(envelope-from ambrisko)
Date: Wed, 29 Oct 2025 14:06:48 -0700
From: Doug Ambrisko <ambrisko@ambrisko.com>
To: Peter Eriksson <pen@lysator.liu.se>
Cc: Rick Macklem <rick.macklem@gmail.com>,
        FreeBSD CURRENT <freebsd-current@freebsd.org>,
        Garrett Wollman <wollman@bimajority.org>,
        Alexander Motin <mav@freebsd.org>
Subject: Re: RFC: How ZFS handles arc memory use
Message-ID: <aQKB6P3HNKVNQGip@ambrisko.com>
References: <CAM5tNy5b3=04zC84Q_c60A9qssZTEY2n73okXoFPeT+YSK25JQ@mail.gmail.com>
 <F848B1F3-DE79-49D3-8D1C-1CB1BB2055E3@lysator.liu.se>
List-Id: Discussions about the use of FreeBSD-current <freebsd-current.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-current
List-Help: <mailto:freebsd-current+help@freebsd.org>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Subscribe: <mailto:freebsd-current+subscribe@freebsd.org>
List-Unsubscribe: <mailto:freebsd-current+unsubscribe@freebsd.org>
Sender: owner-freebsd-current@FreeBSD.org
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <F848B1F3-DE79-49D3-8D1C-1CB1BB2055E3@lysator.liu.se>
X-Spamd-Bar: --
X-Spamd-Result: default: False [-2.50 / 15.00];
	SUSPICIOUS_RECIPS(1.50)[];
	NEURAL_HAM_MEDIUM(-1.00)[-1.000];
	NEURAL_HAM_LONG(-1.00)[-1.000];
	NEURAL_HAM_SHORT(-1.00)[-1.000];
	DMARC_POLICY_ALLOW(-0.50)[ambrisko.com,reject];
	R_SPF_ALLOW(-0.20)[+ip4:100.20.204.14];
	R_DKIM_ALLOW(-0.20)[ambrisko.com:s=default];
	MIME_GOOD(-0.10)[text/plain];
	DKIM_TRACE(0.00)[ambrisko.com:+];
	RCVD_TLS_LAST(0.00)[];
	ARC_NA(0.00)[];
	MISSING_XM_UA(0.00)[];
	FREEMAIL_CC(0.00)[gmail.com,freebsd.org,bimajority.org];
	TO_DN_ALL(0.00)[];
	HAS_XAW(0.00)[];
	ASN(0.00)[asn:16509, ipnet:100.20.0.0/14, country:US];
	MIME_TRACE(0.00)[0:+];
	TO_MATCH_ENVRCPT_SOME(0.00)[];
	FROM_HAS_DN(0.00)[];
	FROM_EQ_ENVFROM(0.00)[];
	FREEFALL_USER(0.00)[ambrisko];
	RCVD_COUNT_TWO(0.00)[2];
	MID_RHS_MATCH_FROM(0.00)[];
	MLMMJ_DEST(0.00)[freebsd-current@freebsd.org];
	TAGGED_RCPT(0.00)[];
	RCPT_COUNT_FIVE(0.00)[5]
X-Rspamd-Queue-Id: 4cxfsr3XSVz494L

It seems around the switch to OpenZFS I would have arc clean task running
100% on a core.  I use nullfs on my laptop to map my shared ZFS /data
partiton into a few vnet instances.  Over night or so I would get into
this issue.  I found that I had a bunch of vnodes being held by other
layers.  My solution was to reduce kern.maxvnodes and vfs.zfs.arc.max so
the ARC cache stayed reasonable without killing other applications.

That is why a while back I added the vnode count to mount -v so that
I could see the usage of vnodes for each mount point.  I made a script
to report on things:

 #!/bin/sh

 (
 sysctl kstat.zfs.misc.arcstats.arc_prune
 sysctl kstat.zfs.misc.arcstats.arc_raw_size
 sysctl kstat.zfs.misc.arcstats.c_max
 sysctl vfs.zfs.arc_max
 sysctl vfs.wantfreevnodes
 sysctl vfs.freevnodes
 sysctl vfs.vnodes_created
 sysctl vfs.numvnodes
 sysctl vfs.vnode.vnlru.kicks
 sysctl vfs.numvnodes
 ) | awk '{ printf "%40s %10d\n", $1, $2 }'

 mount -v | while read fs_type junk mount params
 do
        count=`echo "$params" | sed 's/^.*count //' | sed 's/ [\)]//'`
        #echo "$fs_type, $mount, $params"
        echo "$count, $fs_type, $mount"
 done | awk '{ printf "%10d %5s %s\n", $1, $2, $3 }' | sort -nr | head

With a reduced vnode max and arc max my system stays reasonably responsive
over weeks or longer assuming I can suspend and resume without a crash.

Thanks,

Doug A.

On Wed, Oct 22, 2025 at 10:46:50PM +0200, Peter Eriksson wrote:
| I too am seeing this issue with some of our FreeBSD 14.n running “backup” servers. However they are not NFS servers - they are rsync targets backing up the user-facing NFS(&SMB) servers. 
| 
| Every now and then (once a month) I see the load numbers spiking and disk I/O going thru the roof - and then performance grinds to a virtual halt (responds extremely slowly to shells/console I/O) for many hours, before it resolves itself (partly). A reboot fixes the issue better though, and then it runs fine for a couple of months before it happens again.
| 
| Our user-facing (NFS & SMB-serving) servers are all still running FreeBSD 13.x though in order to avoid running into this issue - but eventually we will be forced to upgrade to 14 when 13 gets EOLd next year (so it would be nice if this issue could be fixed before that :-)... 
| 
| Our 14.3-running servers involved have each 512-768GB of RAM, many many (22k on one 111k on the other) ZFS filesystems and many files and directories (somewhere around 1G on one 2.5G on the other). 
| 
| I’ve seen some posts that this issue (or something similar) has been seen in the Linux world too and there is (I think) was related to inodes/vnodes directory caching causing pinned memory being allocated that couldn’t be released easily - and since my rsync backup servers will scan all files/directories when doing backups this sounds plausible…
| 
| Here’s a graf showing some measurements when this happens - the grafs aren’t perfectly aligned but things starts at 22:00 when the backups start - when the load number starts to spike, and at the same time something causes the vfs_numvnodes counter to drop drastically. The ZFS memory usage doesn’t seem to be happening much at that time though… We measure a lot of different values so let me know if there’s some other number someone would like so see :-)
| 


| 
| 
| - Peter
| 
| > On 22 Oct 2025, at 16:34, Rick Macklem <rick.macklem@gmail.com> wrote:
| > 
| > Hi,
| > 
| > A couple of people have reported problems with NFS servers,
| > where essentially all of the system's memory gets exhausted.
| > They see the problem on 14.n FreeBSD servers (which use the
| > newer ZFS code) but not on 13.n servers.
| > 
| > I am trying to learn how ZFS handles arc memory use to try
| > and figure out what can be done about this problem.
| > 
| > I know nothing about ZFS internals or UMA(9) internals,
| > so I could be way off, but here is what I think is happening.
| > (Please correct me on this.)
| > 
| > The L1ARC uses uma_zalloc_arg()/uma_zfree_arg() to allocate
| > the arc memory. The zones are created using uma_zcreate(),
| > so they are regular zones. This means the pages are coming
| > from a slab in a keg, which are wired pages.
| > 
| > The only time the size of the slab/keg will be reduced by ZFS
| > is when it calls uma_zone_reclaim(.., UMA_RECLAIM_DRAIN),
| > which is called by arc_reap_cb(), triggered by arc_reap_cb_check().
| > 
| > arc_reap_cb_check() uses arc_available_memory() and triggers
| > arc_reap_cb() when arc_available_memory() returns a negative
| > value.
| > 
| > arc_available_memory() returns a negative value when
| > zfs_arc_free_target (vfs.zfs.arc.free_target) is greater than freemem.
| > (By default, zfs_arc_free_target is set to vm_cnt.v_free_taget.)
| > 
| > Does all of the above sound about right?
| > 
| > This leads me to...
| > - zfs_arc_free_target (vfs.zfs.arc.free_target) needs to be larger
| > or
| > - Most of the wired pages in the slab are per-cpu,
| >  so the uma_zone_reclaim() needs to UMA_RECLAIM_DRAIN_CPU
| >  on some systems. (Not the small test systems I have, where I
| >  cannot reproduce the problem.)
| > or
| > - uma_zone_reclaim() needs to be called under other
| >  circumstances.
| > or
| > - ???
| > 
| > How can you tell if a keg/slab is per-cpu?
| > (For my simple test system, I only see "UMA Slabs 0:" and
| > "UMA Slabs 1:". It looks like UMA Slabs 0: is being used for
| > ZFS arc allocation for this simple test system.)
| > 
| > Hopefully folk who understand ZFS arc allocation or UMA
| > can jump in and help out, rick
|