From nobody Thu Oct 27 00:24:29 2022 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4MyRJF67H5z4gWQl; Thu, 27 Oct 2022 00:24:29 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4MyRJF5bDDz4PJN; Thu, 27 Oct 2022 00:24:29 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1666830269; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=p4jDkPVO1OwOCR8+pOMEiOTVwvEtlYIA3H6bDNN9aD4=; b=wTpKhK+7YPesEnmoLZFHtNE4QojvmxbtWBx6HRWb9UAv58o0s8Z/miU0GoZ8yY8s0K7ohu gnQjhhsxw2nDOv5C27iFdk0VP2XgNdnrjQqZpoOVZw9+LCn0R1pZ2xCvurmwog6srA5m8H UxbxSWAcbrKzBNzRqqlhfQdxiTfxHcJX29c3QWpzh4YDG+SB2snE5eGKHtlmLuNNcqm+dT lFRPce/HOt3O0u4TKC/azUAcRAd5hSWD+sTArJnWUuKbCwzu9TDpTqbwzDC+O2UKkAfVjy 8O4pBWp3yeQAZar2nUCT7uSxJpGUrbRAWHgcBnFh5jwQQHtjBOP68DinTFKyXA== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4MyRJF4g7kzd6Q; Thu, 27 Oct 2022 00:24:29 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 29R0OTq5033518; Thu, 27 Oct 2022 00:24:29 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 29R0OTbU033517; Thu, 27 Oct 2022 00:24:29 GMT (envelope-from git) Date: Thu, 27 Oct 2022 00:24:29 GMT Message-Id: <202210270024.29R0OTbU033517@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: "Jason A. Harmening" Subject: git: 080ef8a41851 - main - Add VV_CROSSLOCK vnode flag to avoid cross-mount lookup LOR List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: jah X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: 080ef8a41851fb2e2cf066cf2d8e25daff37f91f Auto-Submitted: auto-generated ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1666830269; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=p4jDkPVO1OwOCR8+pOMEiOTVwvEtlYIA3H6bDNN9aD4=; b=ydaZPigvsVY00NXtkuvnkGo3IoEg+SlUOPcd+/TLueLiWsZcLerbciK9kIXggO14c+Xdo4 nxd56GWI4KefNAnkVAwKLeqsB6D5mqYKGCtEAMLfb+04aGGyXcbD1g2uuKro0tbuK8Fhtg 7NW5vUpSVeHcEPqeeA981fFJvTelu9ejL12A3Np750Goc8JV0RDk0Qp6DZFzsr/iB0rwey IZYkPGvKHdKRa7DOg9CwmbnuehpqSGg0NddvrOFhYc+lOgdW6GJvEzA3SXgwDgEFQbdkSK 14GQd7mUJO89jV4ryGOvZvkWJoiyXXNniiuj6W+8BG0usXsTOH/ZyaKKBsAd6w== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1666830269; a=rsa-sha256; cv=none; b=V2kbdhT7+8mHndjE0NtfwJtvYBoIq60VDpFGOGJ/OSnyC0FKrOguyBTKpNcIcR7JSFHdhc Ig5jO1DSFFah1qwpQtXBiyosRETCjXJb837m+BFGBy5bgpAxsHw7qPwllCWGassNJGeGtN RRU9ei6oxh9978xCBJWPN9aZHIXyEE9ZYsUEzcAYye29HV4uXCbDg7JR1PK0wXOsHtfMA+ x4yiqQlxmy0QWn9NopwTLu8b5JiOIHBNpUqKGpRJusPWulLFFP0XcxsMU03jSRS5WWRfCE SIM1kMi3ngCZOeGWY5Z6nbxLqwAC1g5KaJ/5AkwIGivKbIlSlPLg09paw6xP1Q== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N The branch main has been updated by jah: URL: https://cgit.FreeBSD.org/src/commit/?id=080ef8a41851fb2e2cf066cf2d8e25daff37f91f commit 080ef8a41851fb2e2cf066cf2d8e25daff37f91f Author: Jason A. Harmening AuthorDate: 2022-08-05 05:32:49 +0000 Commit: Jason A. Harmening CommitDate: 2022-10-27 00:33:03 +0000 Add VV_CROSSLOCK vnode flag to avoid cross-mount lookup LOR When a lookup operation crosses into a new mountpoint, the mountpoint must first be busied before the root vnode can be locked. When a filesystem is unmounted, the vnode covered by the mountpoint must first be locked, and then the busy count for the mountpoint drained. Ordinarily, these two operations work fine if executed concurrently, but with a stacked filesystem the root vnode may in fact use the same lock as the covered vnode. By design, this will always be the case for unionfs (with either the upper or lower root vnode depending on mount options), and can also be the case for nullfs if the target and mount point are the same (which admittedly is very unlikely in practice). In this case, we have LOR. The lookup path holds the mountpoint busy while waiting on what is effectively the covered vnode lock, while a concurrent unmount holds the covered vnode lock and waits for the mountpoint's busy count to drain. Attempt to resolve this LOR by allowing the stacked filesystem to specify a new flag, VV_CROSSLOCK, on a covered vnode as necessary. Upon observing this flag, the vfs_lookup() will leave the covered vnode lock held while crossing into the mountpoint. Employ this flag for unionfs with the caveat that it can't be used for '-o below' mounts until other unionfs locking issues are resolved. Reported by: pho Tested by: pho Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D35054 --- sys/fs/unionfs/union_vfsops.c | 27 +++++++++++++++++++++++++++ sys/kern/vfs_lookup.c | 22 +++++++++++++++++++--- sys/kern/vfs_subr.c | 25 +++++++++++++++---------- sys/sys/vnode.h | 1 + 4 files changed, 62 insertions(+), 13 deletions(-) diff --git a/sys/fs/unionfs/union_vfsops.c b/sys/fs/unionfs/union_vfsops.c index 7e24d089bada..c6d668606ec9 100644 --- a/sys/fs/unionfs/union_vfsops.c +++ b/sys/fs/unionfs/union_vfsops.c @@ -310,6 +310,30 @@ unionfs_domount(struct mount *mp) return (ENOENT); } + /* + * Specify that the covered vnode lock should remain held while + * lookup() performs the cross-mount walk. This prevents a lock-order + * reversal between the covered vnode lock (which is also locked by + * unionfs_lock()) and the mountpoint's busy count. Without this, + * unmount will lock the covered vnode lock (directly through the + * covered vnode) and wait for the busy count to drain, while a + * concurrent lookup will increment the busy count and then lock + * the covered vnode lock (indirectly through unionfs_lock()). + * + * Note that we can't yet use this facility for the 'below' case + * in which the upper vnode is the covered vnode, because that would + * introduce a different LOR in which the cross-mount lookup would + * effectively hold the upper vnode lock before acquiring the lower + * vnode lock, while an unrelated lock operation would still acquire + * the lower vnode lock before the upper vnode lock, which is the + * order unionfs currently requires. + */ + if (!below) { + vn_lock(mp->mnt_vnodecovered, LK_EXCLUSIVE | LK_RETRY); + mp->mnt_vnodecovered->v_vflag |= VV_CROSSLOCK; + VOP_UNLOCK(mp->mnt_vnodecovered); + } + MNT_ILOCK(mp); if ((lowermp->mnt_flag & MNT_LOCAL) != 0 && (uppermp->mnt_flag & MNT_LOCAL) != 0) @@ -362,6 +386,9 @@ unionfs_unmount(struct mount *mp, int mntflags) if (error) return (error); + vn_lock(mp->mnt_vnodecovered, LK_EXCLUSIVE | LK_RETRY); + mp->mnt_vnodecovered->v_vflag &= ~VV_CROSSLOCK; + VOP_UNLOCK(mp->mnt_vnodecovered); vfs_unregister_upper(ump->um_lowervp->v_mount, &ump->um_lower_link); vfs_unregister_upper(ump->um_uppervp->v_mount, &ump->um_upper_link); free(ump, M_UNIONFSMNT); diff --git a/sys/kern/vfs_lookup.c b/sys/kern/vfs_lookup.c index 7e6ab13a4fd0..589decb14fc2 100644 --- a/sys/kern/vfs_lookup.c +++ b/sys/kern/vfs_lookup.c @@ -901,6 +901,8 @@ vfs_lookup(struct nameidata *ndp) struct componentname *cnp = &ndp->ni_cnd; int lkflags_save; int ni_dvp_unlocked; + int crosslkflags; + bool crosslock; /* * Setup: break out flag bits into variables. @@ -1232,18 +1234,32 @@ good: */ while (dp->v_type == VDIR && (mp = dp->v_mountedhere) && (cnp->cn_flags & NOCROSSMOUNT) == 0) { + crosslock = (dp->v_vflag & VV_CROSSLOCK) != 0; + crosslkflags = compute_cn_lkflags(mp, cnp->cn_lkflags, + cnp->cn_flags); + if (__predict_false(crosslock) && + (crosslkflags & LK_EXCLUSIVE) != 0 && + VOP_ISLOCKED(dp) != LK_EXCLUSIVE) { + vn_lock(dp, LK_UPGRADE | LK_RETRY); + if (VN_IS_DOOMED(dp)) { + error = ENOENT; + goto bad2; + } + } if (vfs_busy(mp, 0)) continue; - vput(dp); + if (__predict_true(!crosslock)) + vput(dp); if (dp != ndp->ni_dvp) vput(ndp->ni_dvp); else vrele(ndp->ni_dvp); vrefact(vp_crossmp); ndp->ni_dvp = vp_crossmp; - error = VFS_ROOT(mp, compute_cn_lkflags(mp, cnp->cn_lkflags, - cnp->cn_flags), &tdp); + error = VFS_ROOT(mp, crosslkflags, &tdp); vfs_unbusy(mp); + if (__predict_false(crosslock)) + vput(dp); if (vn_lock(vp_crossmp, LK_SHARED | LK_NOWAIT)) panic("vp_crossmp exclusively locked or reclaimed"); if (error) { diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c index adf2968e7f56..931153ea32a6 100644 --- a/sys/kern/vfs_subr.c +++ b/sys/kern/vfs_subr.c @@ -770,17 +770,22 @@ SYSINIT(vfs, SI_SUB_VFS, SI_ORDER_FIRST, vntblinit, NULL); * +->F->D->E * * The lookup() process for namei("/var") illustrates the process: - * VOP_LOOKUP() obtains B while A is held - * vfs_busy() obtains a shared lock on F while A and B are held - * vput() releases lock on B - * vput() releases lock on A - * VFS_ROOT() obtains lock on D while shared lock on F is held - * vfs_unbusy() releases shared lock on F - * vn_lock() obtains lock on deadfs vnode vp_crossmp instead of A. - * Attempt to lock A (instead of vp_crossmp) while D is held would - * violate the global order, causing deadlocks. + * 1. VOP_LOOKUP() obtains B while A is held + * 2. vfs_busy() obtains a shared lock on F while A and B are held + * 3. vput() releases lock on B + * 4. vput() releases lock on A + * 5. VFS_ROOT() obtains lock on D while shared lock on F is held + * 6. vfs_unbusy() releases shared lock on F + * 7. vn_lock() obtains lock on deadfs vnode vp_crossmp instead of A. + * Attempt to lock A (instead of vp_crossmp) while D is held would + * violate the global order, causing deadlocks. * - * dounmount() locks B while F is drained. + * dounmount() locks B while F is drained. Note that for stacked + * filesystems, D and B in the example above may be the same lock, + * which introdues potential lock order reversal deadlock between + * dounmount() and step 5 above. These filesystems may avoid the LOR + * by setting VV_CROSSLOCK on the covered vnode so that lock B will + * remain held until after step 5. */ int vfs_busy(struct mount *mp, int flags) diff --git a/sys/sys/vnode.h b/sys/sys/vnode.h index 167f1904b70d..52f735713a30 100644 --- a/sys/sys/vnode.h +++ b/sys/sys/vnode.h @@ -276,6 +276,7 @@ struct xvnode { #define VV_FORCEINSMQ 0x1000 /* force the insmntque to succeed */ #define VV_READLINK 0x2000 /* fdescfs linux vnode */ #define VV_UNREF 0x4000 /* vunref, do not drop lock in inactive() */ +#define VV_CROSSLOCK 0x8000 /* vnode lock is shared w/ root mounted here */ #define VMP_LAZYLIST 0x0001 /* Vnode is on mnt's lazy list */