From nobody Sat Nov 15 02:12:07 2025 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4d7cvr6RtYz6GqwG; Sat, 15 Nov 2025 02:12:08 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R13" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4d7cvr5rVdz3www; Sat, 15 Nov 2025 02:12:08 +0000 (UTC) (envelope-from jhb@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1763172728; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3CEIgZmKJJwMnw9pK1OBRqqLPCyn+1Lr1aAUI69q3A8=; b=aa9gXKFizSBdeQvjgtWeG87O4G0gMNzpG4a3ppN3gC28v+1+7KGCiHG0Sj3MhpvZO32s71 4erGoMlz0nNHmRGev0fbAVs0FHSf83Qs1kK05fSTONj6QfyEwKUPDw6VKsVOzvyo9ktqda i3ejUtQzf2D12ZUtfB1PQMiC8kZJVklcjbMvQ4eBhGjtD86JKW/75TUaSS9M2YgIDZGANm 3O3DjALfCzyuDNA3szTZ0p3gvcxmiltSt27iGTCL68C/R0oJaS7je/gHShfLeiG+3x4Mh+ iyeuG7Px52qhAA0PSvX3O/IdUnoy95g8QEHDqI2RHosWX5A8ijhwPPdjP6luLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1763172728; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3CEIgZmKJJwMnw9pK1OBRqqLPCyn+1Lr1aAUI69q3A8=; b=bIfjiEfQ/VlrJ9O23df9UJw5Z5kmvhtNQN/scWNVl2YHVIhm7SX6SSjcYZ1a3TdY2EnJud aIKiDyPnMRWyzPKZQQF5p+K9ft7VwodM8s9A5dbviuty4I2buCWt8PFiyizkAPMb9mZPVL UujtdLBI1By5p67i055FxA1SqJWGDI95KZQcqXBh/6I7RkbWaWMm9q4cRm3np2DkNBFxwq F2lJES3YlTQK8zkRYq2WVeIS1/Q1jI5goWtULqsqfLGHIw9UbOWHUqIYtZp4XXsV1qDQ/C Fi3hbde1nUHJn6hir7nMHLFpefwk5IkkZgV1VbMjOMOWJNB1p1OyLGegQ4bD3g== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1763172728; a=rsa-sha256; cv=none; b=hio/vVHyiHUm7V8Ezq5Z3tTqmK5+bXxI9A/Y8faTLmrPUiBsZtxRZd8Eqswy8a88St6NhW 3+8BvhsW8jA57r+H1v8Thy8iHyVpF5c+2ihud5G1zyN9US+j5joL1FxeE/0sMJbOJfS8a2 mrZxqQkNiMWmuaGX5FhW5haJ/5N39ySDxeNJr1BJwcgQjHH/fpmCtdUm9uRMLufsuRfGxe abi2jnJsGp3Pt3mcUh3G2Lcgg9b1GUz2BuN39A+Im5E9lI/YLIX78SMJ9+MBvjDphwsZqP 4YlLc0ydFg5SfpC5YLXYTsXQO2FxV2XIQB98NgaOYw/Ag/6H8Sx2vnq1KFXngg== ARC-Authentication-Results: i=1; mx1.freebsd.org; none Received: from [IPV6:2601:5c0:4202:5670:b08a:fd82:d04a:e316] (unknown [IPv6:2601:5c0:4202:5670:b08a:fd82:d04a:e316]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) (Authenticated sender: jhb) by smtp.freebsd.org (Postfix) with ESMTPSA id 4d7cvr476lz11gn; Sat, 15 Nov 2025 02:12:08 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Message-ID: <854dff8a-555a-4921-ae64-4c445c7a03f7@FreeBSD.org> Date: Fri, 14 Nov 2025 21:12:07 -0500 List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: dev-commits-src-main@freebsd.org Sender: owner-dev-commits-src-main@FreeBSD.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: git: 99cb3dca4773 - main - vnode: Rework vput() to avoid holding the vnode lock after decrementing Content-Language: en-US To: Mark Johnston , src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org References: <202511111459.5ABEx8Co056214@gitrepo.freebsd.org> From: John Baldwin In-Reply-To: <202511111459.5ABEx8Co056214@gitrepo.freebsd.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 11/11/25 09:59, Mark Johnston wrote: > The branch main has been updated by markj: > > URL: https://cgit.FreeBSD.org/src/commit/?id=99cb3dca4773fe4a16c500f9cb55fcd62cd8d7f3 > > commit 99cb3dca4773fe4a16c500f9cb55fcd62cd8d7f3 > Author: Mark Johnston > AuthorDate: 2025-11-11 14:47:06 +0000 > Commit: Mark Johnston > CommitDate: 2025-11-11 14:58:59 +0000 > > vnode: Rework vput() to avoid holding the vnode lock after decrementing > > It is not safe to modify the vnode structure after releasing one's > reference. Modify vput() to avoid this. Use refcount_release_if_last() > to opportunistically call vput_final() with the vnode lock held since we > need the vnode lock in order to deactivate the vnode, and it's silly to > drop the vnode lock and immediately reacquire it in this common case. > > Note that vunref() has a similar flaw. D52628 aims to fix the problem > more holistically, but this change fixes observable panics in the > meantime. > > Reported by: syzbot+6676b3ff282d590b0fb3@syzkaller.appspotmail.com > Reported by: syzbot+38e26cf6f959e886f110@syzkaller.appspotmail.com > Reviewed by: kib > MFC after: 3 days > Differential Revision: https://reviews.freebsd.org/D52608 > --- > sys/kern/vfs_subr.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c > index 58975f7ac932..9cf983f6f89d 100644 > --- a/sys/kern/vfs_subr.c > +++ b/sys/kern/vfs_subr.c > @@ -3713,11 +3713,12 @@ vput(struct vnode *vp) > > ASSERT_VOP_LOCKED(vp, __func__); > ASSERT_VI_UNLOCKED(vp, __func__); > - if (!refcount_release(&vp->v_usecount)) { > - VOP_UNLOCK(vp); > + if (refcount_release_if_last(&vp->v_usecount)) { > + vput_final(vp, VPUT); > return; > } > - vput_final(vp, VPUT); > + VOP_UNLOCK(vp); > + vrele(vp); > } This commit results in reliable hangs for me when I do a 'make -j 16 buildkernel' in a VM where the source directory is mounted over NFS. Even if there is nothing to build so the make is just traversing all the directories. The hung process is stuck on a vnode lock. Some of the stacks from procstat: PID TID COMM TDNAME KSTACK 1858 100343 make - mi_switch+0x172 sleepq_switch+0x109 sleeplk+0x110 lockmgr_slock_hard+0x3c3 VOP_LOCK1_APV+0x32 nfs_lock+0x2c vop_sigdefer+0x30 VOP_LOCK1_APV+0x32 _vn_lock+0x53 vget_finish+0xaf cache_lookup+0x2be nfs_lookup+0x4e1 vop_sigdefer+0x30 VOP_LOOKUP_APV+0x57 vfs_lookup+0x5aa namei+0x35d vn_open_cred+0x592 openatfp+0x2b7 root@head:~ # procstat -kk 1860 1860 100360 make - mi_switch+0x172 sleepq_switch+0x109 sleeplk+0x110 lockmgr_slock_hard+0x3c3 VOP_LOCK1_APV+0x32 nfs_lock+0x2c vop_sigdefer+0x30 VOP_LOCK1_APV+0x32 _vn_lock+0x53 vget_finish+0xaf cache_lookup+0x2be nfs_lookup+0x4e1 vop_sigdefer+0x30 VOP_LOOKUP_APV+0x57 vfs_lookup+0x5aa namei+0x35d vn_open_cred+0x592 openatfp+0x2b7 root@head:~ # procstat -kk 1855 1855 100314 make - mi_switch+0x172 sleepq_switch+0x109 sleeplk+0x110 lockmgr_xlock_hard+0x3d0 VOP_LOCK1_APV+0x32 nfs_lock+0x2c vop_sigdefer+0x30 VOP_LOCK1_APV+0x32 _vn_lock+0x53 vput_final+0x269 vput+0xab vfs_lookup+0xa7a namei+0x35d vn_open_cred+0x592 openatfp+0x2b7 sys_open+0x28 amd64_syscall+0x169 fast_syscall_common+0xf8 This last one is kind of interesting as it calls vput_final? (kgdb) p vp $2 = (struct vnode *) 0xfffff8016bb6c898 (kgdb) p vp->v_lock $3 = {lock_object = {lo_name = 0xffffffff81467118 "nfs", lo_flags = 117112832, lo_data = 0, lo_witness = 0xfffff8043fd83900}, lk_lock = 39, lk_exslpfail = 0, lk_pri = 44, lk_timo = 6} (kgdb) p/x vp->v_lock->lk_lock $4 = 0x27 So there is one shared lock, and both shared and exclusive waiters. Seems like a vnode lock has been leaked? Or rather, the vput_final() in this case is called from vrele(), so the vput() for 1855 has hit the new path. I wonder if when it dropped its lock via VOP_UNLOCK() it is no longer eligible to re-lock it as it needs some other sharer to also unlock before it can re-acquire the lock in vput_final? LK_UPGRADE attempts don't block if there are shared waiters but only one 1 shared lock held, but if you do LK_UNLOCK and then LK_XLOCK that will indeed honor the other waiters and not acquire the lock, so a vput that held the last shared lock would have "worked" to ugprade before even with other waiters, but now it can drop the lock and then get stuck. I guess the question though in this case is that some other thread must have snuck in and acquired a shared lock between the VOP_UNLOCK and the vrele() and is blocked while holding it. Aha, so most of the other makes are stuck in VOP_LOOKUP on this same vnode (which I believe is "/usr/src/sys/dev" as in 1855 the vput() is inside of namei() and this is the state of ndp: (kgdb) p *ndp $9 = { ni_dirp = 0x481afdaa2d7d , ni_segflg = UIO_USERSPACE, ni_rightsneeded = 0xfffffe00d83d2d90, ni_startdir = 0x0, ni_rootdir = 0xfffff80005e461b8, ni_topdir = 0x0, ni_dirfd = -100, ni_lcf = 0, ni_filecaps = {fc_rights = {cr_rights = {0, 0}}, fc_ioctls = 0x0, fc_nioctls = -1, fc_fcntls = 0}, ni_vp = 0xfffff801bc007000, ni_dvp = 0xfffff8016bb6c898, ni_resflags = 1, ni_debugflags = 3, ni_loopcnt = 0, ni_pathlen = 1, ni_next = 0xfffff8000a0b0817 "", ni_cnd = {cn_flags = 8936259908, cn_cred = 0xfffff80005e53c00, cn_nameiop = LOOKUP, cn_lkflags = 524288, cn_pnbuf = 0xfffff8000a0b0800 "/usr/src/sys/dev/acpica", cn_nameptr = 0xfffff8000a0b0811 "acpica", cn_namelen = 6}, ni_cap_tracker = {tqh_first = 0x0, tqh_last = 0xfffffe00d83d2d48}, ni_rbeneath_dpp = 0x0, ni_nctrack_mnt = 0x0, ni_dvp_seqc = 0, ni_vp_seqc = 0} But pid 1844 is a make blocked on another vnode: (kgdb) p vp $24 = (struct vnode *) 0xfffff801bc007000 And that one appears to be the "acpica" child directory: (kgdb) p *vp->v_cache_dst.tqh_first $27 = {nc_src = {le_next = 0xfffff801bc01b9b8, le_prev = 0xfffff801bc01b898}, nc_dst = {tqe_next = 0x0, tqe_prev = 0xfffff801bc007058}, nc_hash = { csle_next = 0x0}, nc_dvp = 0xfffff8016bb6c898, n_un = { nu_vp = 0xfffff801bc007000, nu_neg = {neg_flag = 0 '\000', neg_hit = 112 'p'}}, nc_flag = 12 '\f', nc_nlen = 6 '\006', nc_name = 0xfffff801bc01b962 "acpica"} And note it's nc_dvp is the vnode 1855 is trying to vput and all the other makes are trying to VOP_LOCK in nfs_lookup(). So my guess is that 1844 was able to VOP_LOCK() "/usr/src/sys/dev" in the window between VOP_UNLOCK() and the vput_final() invoked from vrele(). Humm, except that the vnode in question shouldn't be in vput_final at all?? Back to 1855 (proc doing vput_final): (kgdb) #12 0xffffffff80c9bc89 in vput_final (vp=0xfffff8016bb6c898, func=func@entry=VRELE) at /usr/src/sys/kern/vfs_subr.c:3628 3628 error = vn_lock(vp, LK_EXCLUSIVE | LK_INTERLOCK); (kgdb) p vp $6 = (struct vnode *) 0xfffff8016bb6c898 (kgdb) p vp->v_usecount $7 = 14 (kgdb) p vp->v_holdcnt $8 = 7 Or did the other make's manage to acquire new use counts while 1855 was blocked on the lock in vput_final? Anyway, it seems 14 make's are blocked on this vnode (/usr/src/sys/dev). 1844 is blocked on the "acpica" vnode, and another make process (1856) is blocked on "acpica", but via getdirentries() instead of namei(): #4 0xffffffff80b62ec0 in sleeplk (lk=lk@entry=0xfffff801bc007070, flags=flags@entry=2098176, ilk=ilk@entry=0xfffff801bc007098, wmesg=wmesg@entry=0xffffffff81467118 "nfs", pri=, pri@entry=44, timo=timo@entry=6, queue=1) at /usr/src/sys/kern/kern_lock.c:302 #5 0xffffffff80b60fc3 in lockmgr_slock_hard (lk=0xfffff801bc007070, flags=2098176, ilk=0xfffff801bc007098, file=, line=4333, lwa=0x0) at /usr/src/sys/kern/kern_lock.c:693 #6 0xffffffff811b97a2 in VOP_LOCK1_APV ( vop=0xffffffff81aeeb38 , a=a@entry=0xfffffe00d8445c88) at vnode_if.c:2103 #7 0xffffffff80a6050c in nfs_lock (ap=ap@entry=0xfffffe00d8445c88) at /usr/src/sys/fs/nfsclient/nfs_clvnops.c:344 #8 0xffffffff80c83d90 in vop_sigdefer (vop=, a=0xfffffe00d8445c88) at /usr/src/sys/kern/vfs_default.c:1502 #9 0xffffffff811b97a2 in VOP_LOCK1_APV ( vop=0xffffffff81aada28 , a=a@entry=0xfffffe00d8445c88) at vnode_if.c:2103 #10 0xffffffff80cb6b43 in VOP_LOCK1 (vp=0xfffff801bc007000, flags=2098176, file=0xffffffff81359deb "/usr/src/sys/kern/vfs_syscalls.c", line=4333) at ./vnode_if.h:1316 #11 _vn_lock (vp=vp@entry=0xfffff801bc007000, flags=flags@entry=2098176, file=, line=, line@entry=4333) at /usr/src/sys/kern/vfs_vnops.c:1970 #12 0xffffffff80cb2d94 in kern_getdirentries (td=0xfffff80006cb5000, fd=, buf=0x2088dba27000 , count=4096, basep=basep@entry=0xfffffe00d8445df0, residp=residp@entry=0x0, bufseg=UIO_USERSPACE) at /usr/src/sys/kern/vfs_syscalls.c:4333 #13 0xffffffff80cb32a9 in sys_getdirentries (td=, uap=0xfffff80006cb5428) at /usr/src/sys/kern/vfs_syscalls.c:4287 The "acpica" lock is held by a writer at least: (kgdb) p vp $9 = (struct vnode *) 0xfffff801bc007000 (kgdb) p vp->v_lock $10 = {lock_object = {lo_name = 0xffffffff81467118 "nfs", lo_flags = 117112832, lo_data = 0, lo_witness = 0xfffff8043fd83900}, lk_lock = 18446735277917198210, lk_exslpfail = 0, lk_pri = 44, lk_timo = 6} (kgdb) p/x vp->v_lock.lk_lock $14 = 0xfffff80011ebd782 (kgdb) set $td = (struct thread *)0xfffff80011ebd780 (kgdb) p $td->td_tid $15 = 100314 (kgdb) p $td->td_proc->p_comm $16 = "make", '\000' (kgdb) p $td->td_proc->p_pid $17 = 1855 Oh dear, so that's the deadlock. 1855 needs to get an exclusive lock on "/usr/src/sys/dev/" to finish its vput() _while_ it holds an exclusive lock on "/usr/src/sys/dev/acpica". And the other shared lock on "/usr/src/sys/dev" is held by 1844 who is blocked trying to share lock "/usr/src/sys/dev/acpica" during lookup(). So by doing the VOP_UNLOCK() and then reacquiring the lock via vrele(), you've introduced a vnode LOR as the vput() of the parent directory can now try to re-lock the vnode of the parent directory while we hold a vnode on the child vnode. -- John Baldwin