[Bug 258208] [zfs] locks up when using rollback or destroy on both 13.0-RELEASE & sysutils/openzfs port

From: <bugzilla-noreply_at_freebsd.org>
Date: Sat, 25 Sep 2021 15:19:50 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=258208

--- Comment #7 from Mark Johnston <markj@FreeBSD.org> ---
I am not sure how best to fix this.  To elaborate a bit more, the deadlock
occurs because a rollback does a suspend/resume of the target fs.  This
involves taking the teardown write lock; one thing we do with the lock held is
call zfs_rezget() on all vnodes associated with the filesystem, which among
other things throws away all data cached in the page cache.  This requires
pages to be busied with the ZFS write lock held, so I am inclined to think that
zfs_freebsd_getpages() should be responsible for breaking the deadlock as it
does in
https://cgit.freebsd.org/src/commit/?id=cd32b4f5b79c97b293f7be3fe9ddfc9024f7d734
.

zfs_freebsd_getpages() could perhaps trylock and upon failure return some
EAGAIN-like status to ask the fault handler to retry, but I don't see a way to
do that - vm_fault_getpages() squashes the error and does not allow the pager
to return KERN_RESOURCE_SHORTAGE.

Alternately, zfs_freebsd_getpages() could perhaps wire and unbusy the page upon
a trylock failure.  Once it successfully acquires the teardown read lock, it
could re-lookup the fault page and compare or re-insert the wired page if
necessary.

OTOH I cannot see how this is handled on Linux.  In particular, I do not see
how their zfs_rezget() invalidates the page cache.

-- 
You are receiving this mail because:
You are the assignee for the bug.