[Bug 258208] [zfs] locks up when using rollback or destroy on both 13.0-RELEASE & sysutils/openzfs port

From: <bugzilla-noreply_at_freebsd.org>
Date: Thu, 28 Oct 2021 16:03:38 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=258208

--- Comment #20 from Mark Johnston <markj@FreeBSD.org> ---
(In reply to Andriy Gapon from comment #19)
Yeah, I was also wondering about changing the lock order.  I think that would
fix the deadlock but this is getting kind of hairy.  Maybe we could busy the
mountpoint while sleeping on the teardown lock, but I'm not sure.

Taking a step back: zfs_rezget() is triggering the deadlock by busying page
cache pages, which it does so that it can purge cached data which would
otherwise become stale when the dataset is resumed.  But, it really only needs
to purge valid pages, invalid pages won't be mapped, and ZFS marks file pages
as valid while holding the teardown lock.  The deadlock happens when
zfs_rezget() is purging _invalid_ pages that getpages is supposed to fill.  So
perhaps zfs_rezget() can simply ignore invalid pages.

I tried implementing this and it fixes the deadlock in my stress test, which
simply runs buildkernel in a loop and simultaneously rolls back the dataset in
a loop.  This test also uncovered some UAFs, btw:

It's maybe a bit too hacky, since it means that we check the valid state of a
page without busying it, and only the VM object lock is held.  This is ok for
now at least: to mark a page valid, the page must be busied, but the object
lock _and_ busy lock are needed to mark a page invalid.  So if
vm_object_page_remove() encounters an invalid page, there is no guarantee that
it won't later become valid.  For ZFS I think it's safe to assume that vnode
pages only transition invalid->valid under the teardown lock, but that seems
like a delicate assumption...

The only other solution I can see is to add a new VOP to lock a vnode in
preparation for a getpages call.  This VOP could acquire the teardown lock, so
we get a consistent lock order vnode->teardown->busy, and then we don't need to
deal with recursion.  It's not just the page fault handler which needs it
though, at least sendfile and the image activator need it as well.

-- 
You are receiving this mail because:
You are the assignee for the bug.