From nobody Sat Nov 15 04:18:57 2025 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4d7gkb3bS5z6H2C1 for ; Sat, 15 Nov 2025 04:19:19 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-ej1-x636.google.com (mail-ej1-x636.google.com [IPv6:2a00:1450:4864:20::636]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4d7gkZ5lSWz4FZr for ; Sat, 15 Nov 2025 04:19:18 +0000 (UTC) (envelope-from mjguzik@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-ej1-x636.google.com with SMTP id a640c23a62f3a-b7277324054so356094166b.0 for ; Fri, 14 Nov 2025 20:19:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1763180351; x=1763785151; darn=freebsd.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=aUT5bXXLm4N92s+CfDVErEhIJ/cJx6NkmRn+d5ynf7I=; b=WlWy60dzCpfG796VDwpkv7zrjvGjs6tZ3ZWVdEOMyokNzc+dMnC9jDJGhmtVGDB4VI D5AyiC26G+9E2NuIT0n760kdMBpQEFU230eyKLuYu0jM3GxTvm3ALHMKfs8aUc/spoOT sDq5P7YcyOVpHco+VbpZzZXyv6ic/gzflpxfubepinkiJC++LFLj2yLnTt/UqFK+XUJD FeLjP7fYZGyNKIJJoPQHdntT1RoH/X1tAwAlCUDzEN7/210lsNpwoo7MwaN6fUWN3rde 3LD5eyivYjN4jscJjVkn++7+CMUwlu+234m+XW31okmfpzLBsxJT0zflp4xijjaRsCPz nX+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763180351; x=1763785151; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=aUT5bXXLm4N92s+CfDVErEhIJ/cJx6NkmRn+d5ynf7I=; b=Ua0N/ZUpygaF3+eqjf/+oij8O8nT8Fl67KVKZn8hbuXS87gOePgGsuH6e5stBjoYdT N31Ji7w6zA5FqA7MN1HZQMOqbJSO1Vb8WpHt5ppfbtUhfXR/yYjjT8TmrWZb5RO4Hfu8 MpNvpKjpELjmM9wuZdhSZ6yi7bUDUB1YiWS5FzdRrihT5/XxBE6GTw+HvXqg3vJ3/L0z tMJbcB0uoYXN+PeaZsbHvdNSdQFe0KJmbM6kfaQS8ZiP1+qXjhI1cHjlZgwVwzYc/J36 /YZ3gB31uhQCxqzFAmHBKYJndBY3BaqwTd5ySbjjHQyjH9+Cd8z67CaG0NG1Cw+6czsL mPQg== X-Forwarded-Encrypted: i=1; AJvYcCW7NbzDSVSsF1PDzbpboXptqurvVIZnyY4Js4jP59N5nLSUwEVmrwf7g0qoLeJk6nyJ+qYwNqcgkE7KcipDB8/x11nqNA==@freebsd.org X-Gm-Message-State: AOJu0Yy3wSyRqjoRtCvLloYbAL8wmNPtebx5g7UeWOTxdN7qldlnql22 VJyFpLItD+CrZidYpPzSwyFYnUmuQRE2h9iu19ox3Vq4OsNms3wAjhlqXkXh8Uoi8mFXb8gvELS 6InSX63BdbOmvBZNKaHef8X4y2SbuR8A= X-Gm-Gg: ASbGncvML1PWnNYg30zOwdU3Tg0J/9tys2mPMkI7jSndeuVxJVMTJ0ntmZ0mgEDk7Sy ww3EJKWlGJ2uj3YObC9iD4Q7uls6LZBiCrtAgalX7DOeD3umlAHAtG9hYk8nOKxBkmvEa6PvqqB RtKxFBgI+jNo+MoFI9PYVieHwtzoN/JVM0G3rw1Iu+Um6O6o8oS8yIn92Rb5Uut1NNIrH3tP2Tw Tqm69MhUEa3VI+j0Kn16z0tDvgHMBCrapkm958gbV2oJ1OQC31cOyvi7yA03xR7wJWMwrnuI0wq raqqjjQqxh4SK/4XXo3QdjonDw== X-Google-Smtp-Source: AGHT+IHm9rPNSP1UkIo/fjsnYfqraol0Q4KK8hTpR6tsP7q/iFBUQTA9v0/PfrTLfjQoO5Dv9rJIoOK91hkN3TtlebA= X-Received: by 2002:a17:907:86a2:b0:b73:56a7:a36e with SMTP id a640c23a62f3a-b7367949692mr542688866b.50.1763180350360; Fri, 14 Nov 2025 20:19:10 -0800 (PST) List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: dev-commits-src-main@freebsd.org Sender: owner-dev-commits-src-main@FreeBSD.org MIME-Version: 1.0 References: <202511111459.5ABEx8Co056214@gitrepo.freebsd.org> <854dff8a-555a-4921-ae64-4c445c7a03f7@FreeBSD.org> In-Reply-To: <854dff8a-555a-4921-ae64-4c445c7a03f7@FreeBSD.org> From: Mateusz Guzik Date: Sat, 15 Nov 2025 05:18:57 +0100 X-Gm-Features: AWmQ_bl_ecqp3ezWnrMSoiEgDDPsdHU-iF5yAg0MJ_LUgev4cFkxS8G84oRiVHA Message-ID: Subject: Re: git: 99cb3dca4773 - main - vnode: Rework vput() to avoid holding the vnode lock after decrementing To: John Baldwin Cc: Mark Johnston , src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Rspamd-Queue-Id: 4d7gkZ5lSWz4FZr On Sat, Nov 15, 2025 at 3:12=E2=80=AFAM John Baldwin wrot= e: > > On 11/11/25 09:59, Mark Johnston wrote: > > The branch main has been updated by markj: > > > > URL: https://cgit.FreeBSD.org/src/commit/?id=3D99cb3dca4773fe4a16c500f9= cb55fcd62cd8d7f3 > > > > commit 99cb3dca4773fe4a16c500f9cb55fcd62cd8d7f3 > > Author: Mark Johnston > > AuthorDate: 2025-11-11 14:47:06 +0000 > > Commit: Mark Johnston > > CommitDate: 2025-11-11 14:58:59 +0000 > > > > vnode: Rework vput() to avoid holding the vnode lock after decreme= nting > > > > It is not safe to modify the vnode structure after releasing one's > > reference. Modify vput() to avoid this. Use refcount_release_if_= last() > > to opportunistically call vput_final() with the vnode lock held si= nce we > > need the vnode lock in order to deactivate the vnode, and it's sil= ly to > > drop the vnode lock and immediately reacquire it in this common ca= se. > > > > Note that vunref() has a similar flaw. D52628 aims to fix the pro= blem > > more holistically, but this change fixes observable panics in the > > meantime. > > > > Reported by: syzbot+6676b3ff282d590b0fb3@syzkaller.appspotmail.= com > > Reported by: syzbot+38e26cf6f959e886f110@syzkaller.appspotmail.= com > > Reviewed by: kib > > MFC after: 3 days > > Differential Revision: https://reviews.freebsd.org/D52608 > > --- > > sys/kern/vfs_subr.c | 7 ++++--- > > 1 file changed, 4 insertions(+), 3 deletions(-) > > > > diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c > > index 58975f7ac932..9cf983f6f89d 100644 > > --- a/sys/kern/vfs_subr.c > > +++ b/sys/kern/vfs_subr.c > > @@ -3713,11 +3713,12 @@ vput(struct vnode *vp) > > > > ASSERT_VOP_LOCKED(vp, __func__); > > ASSERT_VI_UNLOCKED(vp, __func__); > > - if (!refcount_release(&vp->v_usecount)) { > > - VOP_UNLOCK(vp); > > + if (refcount_release_if_last(&vp->v_usecount)) { > > + vput_final(vp, VPUT); > > return; > > } > > - vput_final(vp, VPUT); > > + VOP_UNLOCK(vp); > > + vrele(vp); > > } > > This commit results in reliable hangs for me when I do a 'make -j 16 buil= dkernel' > in a VM where the source directory is mounted over NFS. Even if there is= nothing > to build so the make is just traversing all the directories. The hung pr= ocess is > stuck on a vnode lock. > > Some of the stacks from procstat: > > PID TID COMM TDNAME KSTACK > 1858 100343 make - mi_switch+0x172 sle= epq_switch+0x109 sleeplk+0x110 lockmgr_slock_hard+0x3c3 VOP_LOCK1_APV+0x32 = nfs_lock+0x2c vop_sigdefer+0x30 VOP_LOCK1_APV+0x32 _vn_lock+0x53 vget_finis= h+0xaf cache_lookup+0x2be nfs_lookup+0x4e1 vop_sigdefer+0x30 VOP_LOOKUP_APV= +0x57 vfs_lookup+0x5aa namei+0x35d vn_open_cred+0x592 openatfp+0x2b7 > root@head:~ # procstat -kk 1860 > 1860 100360 make - mi_switch+0x172 sle= epq_switch+0x109 sleeplk+0x110 lockmgr_slock_hard+0x3c3 VOP_LOCK1_APV+0x32 = nfs_lock+0x2c vop_sigdefer+0x30 VOP_LOCK1_APV+0x32 _vn_lock+0x53 vget_finis= h+0xaf cache_lookup+0x2be nfs_lookup+0x4e1 vop_sigdefer+0x30 VOP_LOOKUP_APV= +0x57 vfs_lookup+0x5aa namei+0x35d vn_open_cred+0x592 openatfp+0x2b7 > root@head:~ # procstat -kk 1855 > 1855 100314 make - mi_switch+0x172 sle= epq_switch+0x109 sleeplk+0x110 lockmgr_xlock_hard+0x3d0 VOP_LOCK1_APV+0x32 = nfs_lock+0x2c vop_sigdefer+0x30 VOP_LOCK1_APV+0x32 _vn_lock+0x53 vput_final= +0x269 vput+0xab vfs_lookup+0xa7a namei+0x35d vn_open_cred+0x592 openatfp+0= x2b7 sys_open+0x28 amd64_syscall+0x169 fast_syscall_common+0xf8 > > This last one is kind of interesting as it calls vput_final? > > (kgdb) p vp > $2 =3D (struct vnode *) 0xfffff8016bb6c898 > (kgdb) p vp->v_lock > $3 =3D {lock_object =3D {lo_name =3D 0xffffffff81467118 "= nfs", > lo_flags =3D 117112832, lo_data =3D 0, lo_witness =3D 0xfffff8043fd8= 3900}, > lk_lock =3D 39, lk_exslpfail =3D 0, lk_pri =3D 44, lk_timo =3D 6} > (kgdb) p/x vp->v_lock->lk_lock > $4 =3D 0x27 > > So there is one shared lock, and both shared and exclusive waiters. Seem= s like > a vnode lock has been leaked? Or rather, the vput_final() in this case i= s > called from vrele(), so the vput() for 1855 has hit the new path. I wond= er if > when it dropped its lock via VOP_UNLOCK() it is no longer eligible to re-= lock > it as it needs some other sharer to also unlock before it can re-acquire = the > lock in vput_final? LK_UPGRADE attempts don't block if there are shared = waiters > but only one 1 shared lock held, but if you do LK_UNLOCK and then LK_XLOC= K that > will indeed honor the other waiters and not acquire the lock, so a vput t= hat > held the last shared lock would have "worked" to ugprade before even with > other waiters, but now it can drop the lock and then get stuck. I guess = the > question though in this case is that some other thread must have snuck in > and acquired a shared lock between the VOP_UNLOCK and the vrele() and is > blocked while holding it. > > Aha, so most of the other makes are stuck in VOP_LOOKUP on this same vnod= e > (which I believe is "/usr/src/sys/dev" as in 1855 the vput() is inside of > namei() and this is the state of ndp: > > (kgdb) p *ndp > $9 =3D { > ni_dirp =3D 0x481afdaa2d7d , ni_segflg =3D UIO_USERSPACE, ni_rightsneeded =3D 0xfffffe00d8= 3d2d90, > ni_startdir =3D 0x0, ni_rootdir =3D 0xfffff80005e461b8, ni_topdir =3D = 0x0, > ni_dirfd =3D -100, ni_lcf =3D 0, ni_filecaps =3D {fc_rights =3D {cr_ri= ghts =3D {0, > 0}}, fc_ioctls =3D 0x0, fc_nioctls =3D -1, fc_fcntls =3D 0}, > ni_vp =3D 0xfffff801bc007000, ni_dvp =3D 0xfffff8016bb6c898, ni_resfla= gs =3D 1, > ni_debugflags =3D 3, ni_loopcnt =3D 0, ni_pathlen =3D 1, > ni_next =3D 0xfffff8000a0b0817 "", ni_cnd =3D {cn_flags =3D 8936259908= , > cn_cred =3D 0xfffff80005e53c00, cn_nameiop =3D LOOKUP, cn_lkflags = =3D 524288, > cn_pnbuf =3D 0xfffff8000a0b0800 "/usr/src/sys/dev/acpica", > cn_nameptr =3D 0xfffff8000a0b0811 "acpica", cn_namelen =3D 6}, > ni_cap_tracker =3D {tqh_first =3D 0x0, tqh_last =3D 0xfffffe00d83d2d48= }, > ni_rbeneath_dpp =3D 0x0, ni_nctrack_mnt =3D 0x0, ni_dvp_seqc =3D 0, ni= _vp_seqc =3D 0} > > But pid 1844 is a make blocked on another vnode: > > (kgdb) p vp > $24 =3D (struct vnode *) 0xfffff801bc007000 > > And that one appears to be the "acpica" child directory: > > (kgdb) p *vp->v_cache_dst.tqh_first > $27 =3D {nc_src =3D {le_next =3D 0xfffff801bc01b9b8, le_prev =3D 0xfffff8= 01bc01b898}, > nc_dst =3D {tqe_next =3D 0x0, tqe_prev =3D 0xfffff801bc007058}, nc_has= h =3D { > csle_next =3D 0x0}, nc_dvp =3D 0xfffff8016bb6c898, n_un =3D { > nu_vp =3D 0xfffff801bc007000, nu_neg =3D {neg_flag =3D 0 '\000', > neg_hit =3D 112 'p'}}, nc_flag =3D 12 '\f', nc_nlen =3D 6 '\006', > nc_name =3D 0xfffff801bc01b962 "acpica"} > > And note it's nc_dvp is the vnode 1855 is trying to vput and all the othe= r > makes are trying to VOP_LOCK in nfs_lookup(). > > So my guess is that 1844 was able to VOP_LOCK() "/usr/src/sys/dev" in the > window between VOP_UNLOCK() and the vput_final() invoked from vrele(). > > Humm, except that the vnode in question shouldn't be in vput_final at all= ?? > > Back to 1855 (proc doing vput_final): > > (kgdb) > #12 0xffffffff80c9bc89 in vput_final (vp=3D0xfffff8016bb6c898, > func=3Dfunc@entry=3DVRELE) at /usr/src/sys/kern/vfs_subr.c:3628 > 3628 error =3D vn_lock(vp, LK_EXCLUSIVE | LK_I= NTERLOCK); > (kgdb) p vp > $6 =3D (struct vnode *) 0xfffff8016bb6c898 > (kgdb) p vp->v_usecount > $7 =3D 14 > (kgdb) p vp->v_holdcnt > $8 =3D 7 > > Or did the other make's manage to acquire new use counts while 1855 was b= locked > on the lock in vput_final? > > Anyway, it seems 14 make's are blocked on this vnode (/usr/src/sys/dev). = 1844 > is blocked on the "acpica" vnode, and another make process (1856) is bloc= ked on > "acpica", but via getdirentries() instead of namei(): > > #4 0xffffffff80b62ec0 in sleeplk (lk=3Dlk@entry=3D0xfffff801bc007070, > flags=3Dflags@entry=3D2098176, ilk=3Dilk@entry=3D0xfffff801bc007098, > wmesg=3Dwmesg@entry=3D0xffffffff81467118 "nfs", > pri=3D, pri@entry=3D44, timo=3Dtimo@entry=3D6, queue= =3D1) > at /usr/src/sys/kern/kern_lock.c:302 > #5 0xffffffff80b60fc3 in lockmgr_slock_hard (lk=3D0xfffff801bc007070, > flags=3D2098176, ilk=3D0xfffff801bc007098, file=3D, l= ine=3D4333, > lwa=3D0x0) at /usr/src/sys/kern/kern_lock.c:693 > #6 0xffffffff811b97a2 in VOP_LOCK1_APV ( > vop=3D0xffffffff81aeeb38 , a=3Da@entry=3D0xfffffe0= 0d8445c88) > at vnode_if.c:2103 > #7 0xffffffff80a6050c in nfs_lock (ap=3Dap@entry=3D0xfffffe00d8445c88) > at /usr/src/sys/fs/nfsclient/nfs_clvnops.c:344 > #8 0xffffffff80c83d90 in vop_sigdefer (vop=3D, > a=3D0xfffffe00d8445c88) at /usr/src/sys/kern/vfs_default.c:1502 > #9 0xffffffff811b97a2 in VOP_LOCK1_APV ( > vop=3D0xffffffff81aada28 , a=3Da@entry=3D0xfffffe00= d8445c88) > at vnode_if.c:2103 > #10 0xffffffff80cb6b43 in VOP_LOCK1 (vp=3D0xfffff801bc007000, flags=3D209= 8176, > file=3D0xffffffff81359deb "/usr/src/sys/kern/vfs_syscalls.c", line= =3D4333) > at ./vnode_if.h:1316 > #11 _vn_lock (vp=3Dvp@entry=3D0xfffff801bc007000, flags=3Dflags@entry=3D2= 098176, > file=3D, line=3D, line@entry=3D4333) > at /usr/src/sys/kern/vfs_vnops.c:1970 > #12 0xffffffff80cb2d94 in kern_getdirentries (td=3D0xfffff80006cb5000, > fd=3D, > buf=3D0x2088dba27000 , count=3D4096, basep=3Dbasep@entry=3D0xfffffe00d8445df0, residp=3D= residp@entry=3D0x0, > bufseg=3DUIO_USERSPACE) at /usr/src/sys/kern/vfs_syscalls.c:4333 > #13 0xffffffff80cb32a9 in sys_getdirentries (td=3D, > uap=3D0xfffff80006cb5428) at /usr/src/sys/kern/vfs_syscalls.c:4287 > > The "acpica" lock is held by a writer at least: > > (kgdb) p vp > $9 =3D (struct vnode *) 0xfffff801bc007000 > (kgdb) p vp->v_lock > $10 =3D {lock_object =3D {lo_name =3D 0xffffffff81467118 = "nfs", > lo_flags =3D 117112832, lo_data =3D 0, lo_witness =3D 0xfffff8043fd8= 3900}, > lk_lock =3D 18446735277917198210, lk_exslpfail =3D 0, lk_pri =3D 44, l= k_timo =3D 6} > (kgdb) p/x vp->v_lock.lk_lock > $14 =3D 0xfffff80011ebd782 > (kgdb) set $td =3D (struct thread *)0xfffff80011ebd780 > (kgdb) p $td->td_tid > $15 =3D 100314 > (kgdb) p $td->td_proc->p_comm > $16 =3D "make", '\000' > (kgdb) p $td->td_proc->p_pid > $17 =3D 1855 > > Oh dear, so that's the deadlock. > > 1855 needs to get an exclusive lock on "/usr/src/sys/dev/" to finish its = vput() > _while_ it holds an exclusive lock on "/usr/src/sys/dev/acpica". And the= other > shared lock on "/usr/src/sys/dev" is held by 1844 who is blocked trying t= o share > lock "/usr/src/sys/dev/acpica" during lookup(). > > So by doing the VOP_UNLOCK() and then reacquiring the lock via vrele(), y= ou've > introduced a vnode LOR as the vput() of the parent directory can now try = to > re-lock the vnode of the parent directory while we hold a vnode on the > child vnode. > This should be easy to fix though. While the routine must not vrele, the VPUT should be patched to trylock (aka LK_NOWAIT) and EAGAIN if that fails, which in turn will punt the vnode to a workqueue. However, one has to wonder if failing to lock here will be frequent enough to constitute a problem. There is a counter for these (deferred_inactive). The following was only boot-tested: diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c index 58975f7ac932..5cde78162e6a 100644 --- a/sys/kern/vfs_subr.c +++ b/sys/kern/vfs_subr.c @@ -3640,11 +3640,8 @@ vput_final(struct vnode *vp, enum vput_op func) break; case VPUT: want_unlock =3D true; - if (VOP_ISLOCKED(vp) !=3D LK_EXCLUSIVE) { - error =3D VOP_LOCK(vp, LK_UPGRADE | LK_INTERLOCK | - LK_NOWAIT); - VI_LOCK(vp); - } + error =3D VOP_LOCK(vp, LK_EXCLUSIVE | LK_INTERLOCK | LK_NOW= AIT); + VI_LOCK(vp); break; case VUNREF: if (VOP_ISLOCKED(vp) !=3D LK_EXCLUSIVE) { @@ -3672,12 +3669,9 @@ vput_final(struct vnode *vp, enum vput_op func) } if (func =3D=3D VUNREF) vp->v_vflag &=3D ~VV_UNREF; - vdropl(vp); - return; out: - if (func =3D=3D VPUT) - VOP_UNLOCK(vp); vdropl(vp); + return; } /* @@ -3713,8 +3707,8 @@ vput(struct vnode *vp) ASSERT_VOP_LOCKED(vp, __func__); ASSERT_VI_UNLOCKED(vp, __func__); + VOP_UNLOCK(vp); if (!refcount_release(&vp->v_usecount)) { - VOP_UNLOCK(vp); return; } vput_final(vp, VPUT);