Re: vnode_init took over 6 minutes

From: Mateusz Guzik <mjguzik_at_gmail.com>
Date: Tue, 31 Oct 2023 14:16:00 UTC
On 10/31/23, John F Carr <jfc@mit.edu> wrote:
> During a poudriere build control-T showed an awk process was stuck
> for over 6 minutes trying to acquire a lock in vnode_init.
>
> load: 8.58  cmd: awk 51238 [running] 392.88r 0.00u 392.88s 100% 2604k
> __mtx_lock_sleep+0xf8 __mtx_lock_flags+0xa4 vnode_init+0xc3
> keg_alloc_slab+0x277 zone_import+0x143 cache_alloc+0x3ed
> cache_alloc_retry+0x23 getnewvnode_reserve+0x20 zfs_zget+0x1f
> zfs_dirent_lookup+0x16d zfs_dirlook+0x7f zfs_lookup+0x3e0
> zfs_freebsd_cachedlookup+0x74 vfs_cache_lookup+0xa7
> cache_fplookup_noentry+0x241 cache_fplookup+0x575 namei+0x1ea
> vn_open_cred+0x48d
>
> The stack trace was the same for several minutes.  System CPU time was
> increasing.
>
> Address vnode_init+0xc3 corresponds to the mtx_lock call here.
>
> 	vp->v_holdcnt = VHOLD_NO_SMR;
> 	vp->v_type = VNON;
> 	mtx_lock(&vnode_list_mtx);
> 	TAILQ_INSERT_BEFORE(vnode_list_free_marker, vp, v_vnodelist);
> 	mtx_unlock(&vnode_list_mtx);
> 	return (0);
>
> Address __mtx_lock_sleep+0xf8 is the instruction after a call to
> lock_delay.
>
> ps says the command line was
>
> /usr/bin/awk -f /usr/bin/awk
> old-default/2023-10-31_08h21m03s/.poudriere.builders
> old-default/2023-10-31_08h21m03s/.poudriere.buildname ...
>
> with the full list of input files exceeding the ~2KB command line length
> limit of ps.
> "/usr/bin/awk" is probably not the real second argument.  It would cause an
> immediate syntax error.
>
> The hang resolved within a few more minutes and poudriere is continuing
> happily.
>
> I have never seen such behavior before.  Code in vfs_subr.c tries not to
> hold the
> vnode_list_mtx lock for too long.
>
> Any thoughts?
>
>
> FreeBSD 13.2-STABLE up through commit b180f0040f95, 24 core 48 thread Zen
> 2,
> zfs pool on SSD connected via 12 Gbps SAS with cache on NVME, 160 GB RAM.
>

what does "sysctl vfs.vnode" say


-- 
Mateusz Guzik <mjguzik gmail.com>