Re: FYI; 14.3: A discord report of Wired Memory growing to 17 GiBytes over something like 60 days; ARC shrinks to, say, 1942 MiBytes

From: Rick Macklem <rick.macklem_at_gmail.com>
Date: Thu, 14 Aug 2025 00:42:55 UTC
On Wed, Aug 13, 2025 at 3:20 PM Rick Macklem <rick.macklem@gmail.com> wrote:
>
> On Wed, Aug 13, 2025 at 12:47 AM Darrin Smith <beldin@beldin.org> wrote:
> >
> > On Tue, 12 Aug 2025 12:57:39 +0300
> > Konstantin Belousov <kostikbel@gmail.com> wrote:
> >
> >
> > > Start looking at differences in periodic shots of vmstat -z and
> > > vmstat -m. It would not catch direct page allocators.
> >
> > Ok, I hope I'm reading these outputs correctlty...
> > Looking at vmstat -z I am assuming the 'size' column shows the size of
> > each malloc bucket and the used indicates the number of buckets used?
> > (A quick look at vmstat.c pointing me to memstat_get_* suggests I'm on
> > the right track) This results in numbers around the right order of
> > magnitude to match my memory.
> >
> > I have noticed with 3 samples over the last 18 hours (in which time it
> > looks like about 1/2 of my memory is now wired, which seems a little
> > execessive, especially considering ZFS is only using about 6 1/2G
> > accoding to top:
> >
> > Mem: 1568M Active, 12G Inact, 656M Laundry, 36G Wired, 994M Buf, 12G
> > Free ARC: 6645M Total, 3099M MFU, 2617M MRU, 768K Anon, 49M Header, 877M
> >           4995M Compressed, 5803M Uncompressed, 1.16:1 Ratio Swap:
> > 8192M Total, 198M Used, 7993M Free, 2% Inuse
> >
> > In the middle of this rang I was building about 1000 packages in
> > poudriere so it's been busy.
> >
> > Interestingly the ZFS ARC size has actually dropped since 9 hours ago
> > when I took the 2nd measurement (was about 15G then) but that was at
> > the height of the build and suggests the ARC is expiring older stuff
> > happily.
> >
> > So assuming the used * size is correct I saw the following big changes
> > in vmstat -z:
> >
> > vm_page:
> >
> > 18 hours ago (before build): 18159063040, 25473990656
> >
> > 9 hours ago (during build) : 27994304512, 29363249152
> > delta                      : +9835241472, +3889258496
> >
> > recent sample              : 14337658880, 35773743104
> > delta                      : -13656645632, +6410493952
> >
> >
> > NAMEI:
> >
> > 18 hours ago:     2 267 478 016
> >
> > 9 hours ago :    13 991 848 960
> > delta       :   +11 724 370 944
> >
> > recent sample:   24 441 244 672
> > delta        :  +10 449 395 712
> >
> >
> > zfs_znode_cache:
> >
> > 18 hours ago: 370777296
> >
> > 9 hours ago : 975800816
> > delta       : +605023520
> >
> > recent sample: 156404656
> > delta        : -819396160
> >
> > VNODE:
> >
> > 18 hours ago: 440384120
> >
> > 9 hours ago : 952734200
> > delta       : +512350080
> >
> > recent sample: 159528160
> > delta        : -793206040
> >
> > Everything else comes out to smaller numbers, so I assume it's probably
> > not them.
> >
> > If Im getting the numbers right I'm seeing various caches
> > expiring after the poudriere build finished. But that NAMEI seems to be
> > growing quite extensively still, don't know if that's expected or not :)
> Are you running the nfsd?
>
> I ask because there might be a pretty basic blunder in the NFS server.
> There several places where the NFS server code calls namei() and
> they don't do a NDFREE_PNBUF() after the call.
> All but one of them is related to the pNFS server, so it would not
> affect anyone (no one uses it), but one of them is used to update the
> V4 export list (a function called nfsrv_v4rootexport()).
>
> So Kostik, should there be a NDFREE_PNBUF() after a successful
> namei() call to get rid of the buffer?
So, I basically answered the question myself. After mjg@'s commit
on Sep. 17, 2022 (5b5b7e2 in main), the buffer is always saved
unless there is an error return.

The "vmstat -z | fgrep NAMEI" count does increase by one each
time I send a SIGHUP to mountd.
This is fixed by adding a NDFREE_PNBUF().

However, one buffer each time exports are reloaded probably is
not the leak you guys are looking for.

rick

>
> rick
>
> >
> > I will keep watching these, and hopefully get a sample after the
> > machine has started killing processess.
> >
> > If any gurus would like .xml dumps of the vmstat -z & -m outputs I have
> > them avaiable (xml easier to import into spreadsheet for me), I can
> > email them or upload them somewhere suitable.
> >
> > Darrin
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> >
> > =b
> >