Re: git: 705a6ee2b611 - main - zfs: Fix a deadlock between page busy and the teardown lock

From: Mark Johnston <markj_at_freebsd.org>
Date: Sat, 20 Nov 2021 16:37:45 UTC
On Sat, Nov 20, 2021 at 04:36:47PM +0000, Mark Johnston wrote:
> The branch main has been updated by markj:
> 
> URL: https://cgit.FreeBSD.org/src/commit/?id=705a6ee2b6112c3a653b2bd68f961a8b5b8071a4
> 
> commit 705a6ee2b6112c3a653b2bd68f961a8b5b8071a4
> Author:     Mark Johnston <markj@FreeBSD.org>
> AuthorDate: 2021-11-20 16:21:25 +0000
> Commit:     Mark Johnston <markj@FreeBSD.org>
> CommitDate: 2021-11-20 16:21:25 +0000
> 
>     zfs: Fix a deadlock between page busy and the teardown lock
>     
>     When rolling back a dataset, ZFS has to purge file data resident in the
>     system page cache.  To do this, it loops over all vnodes for the
>     mountpoint and calls vn_pages_remove() to purge pages associated with
>     the vnode's VM object.  Each page is thus exclusively busied while the
>     dataset's teardown write lock is held.
>     
>     When handling a page fault on a mapped ZFS file, FreeBSD's page fault
>     handler busies newly allocated pages and then uses VOP_GETPAGES to fill
>     them.  The ZFS getpages VOP acquires the teardown read lock with vnode
>     pages already busied.  This represents a lock order reversal which can
>     lead to deadlock.
>     
>     To break the deadlock, observe that zfs_rezget() need only purge those
>     pages marked valid, and that pages busied by the page fault handler are,
>     by definition, invalid.  Furthermore, ZFS pages always transition from
>     invalid to valid with the teardown lock held, and ZFS never creates
>     partially valid pages.  Thus, zfs_rezget() can use the new
>     vn_pages_remove_valid() to skip over pages busied by the fault handler.

I'll get this merged upstream once the new KPI is available in
stable/13, but didn't see a reason to hold off committing directly.