git: 0c5f290ed909 - stable/13 - zfs: Fix a deadlock between page busy and the teardown lock

From: Mark Johnston <markj_at_FreeBSD.org>
Date: Sun, 05 Dec 2021 18:38:52 UTC
The branch stable/13 has been updated by markj:

URL: https://cgit.FreeBSD.org/src/commit/?id=0c5f290ed90963b64b05a2098e0b11c3d9f6471f

commit 0c5f290ed90963b64b05a2098e0b11c3d9f6471f
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2021-11-20 16:21:25 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2021-12-05 18:38:35 +0000

    zfs: Fix a deadlock between page busy and the teardown lock
    
    When rolling back a dataset, ZFS has to purge file data resident in the
    system page cache.  To do this, it loops over all vnodes for the
    mountpoint and calls vn_pages_remove() to purge pages associated with
    the vnode's VM object.  Each page is thus exclusively busied while the
    dataset's teardown write lock is held.
    
    When handling a page fault on a mapped ZFS file, FreeBSD's page fault
    handler busies newly allocated pages and then uses VOP_GETPAGES to fill
    them.  The ZFS getpages VOP acquires the teardown read lock with vnode
    pages already busied.  This represents a lock order reversal which can
    lead to deadlock.
    
    To break the deadlock, observe that zfs_rezget() need only purge those
    pages marked valid, and that pages busied by the page fault handler are,
    by definition, invalid.  Furthermore, ZFS pages always transition from
    invalid to valid with the teardown lock held, and ZFS never creates
    partially valid pages.  Thus, zfs_rezget() can use the new
    vn_pages_remove_valid() to skip over pages busied by the fault handler.
    
    PR:             258208
    Tested by:      pho
    Reviewed by:    avg, sef, kib
    Sponsored by:   The FreeBSD Foundation
    
    (cherry picked from commit 705a6ee2b6112c3a653b2bd68f961a8b5b8071a4)
---
 sys/contrib/openzfs/module/os/freebsd/zfs/zfs_znode.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_znode.c b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_znode.c
index 7b6f77e2c669..d14df9d88a35 100644
--- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_znode.c
+++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_znode.c
@@ -1083,9 +1083,18 @@ zfs_rezget(znode_t *zp)
 	 * the vnode in case of error, but currently we cannot do that
 	 * because of the LOR between the vnode lock and z_teardown_lock.
 	 * So, instead, we have to "doom" the znode in the illumos style.
+	 *
+	 * Ignore invalid pages during the scan.  This is to avoid deadlocks
+	 * between page busying and the teardown lock, as pages are busied prior
+	 * to a VOP_GETPAGES operation, which acquires the teardown read lock.
+	 * Such pages will be invalid and can safely be skipped here.
 	 */
 	vp = ZTOV(zp);
+#if __FreeBSD_version >= 1300522
+	vn_pages_remove_valid(vp, 0, 0);
+#else
 	vn_pages_remove(vp, 0, 0);
+#endif
 
 	ZFS_OBJ_HOLD_ENTER(zfsvfs, obj_num);