Re: livelock in vfs_bio_getpages with vn_io_fault_uiomove

From: Alan Somers <asomers_at_freebsd.org>
Date: Fri, 01 Oct 2021 03:33:24 UTC
On Thu, Sep 30, 2021 at 9:08 PM Konstantin Belousov <kostikbel@gmail.com> wrote:
>
> On Thu, Sep 30, 2021 at 03:11:02PM -0600, Alan Somers wrote:
> > I'm trying to adapt fusefs to use vn_io_fault_uiomove to fix the
> > deadlock described in the comments above vn_io_fault_doio [^1].  I can
> > reproduce the deadlock readily enough, and the fix seems simple enough
> > in other filesystems [^2][^3].  But when I try to apply the same fix
> > to fusefs, the deadlock changes into a livelock.  vfs_bio_getpages
> > loops infinitely because it reaches the "redo = true" state.  But on
> > looping, it never attempts to read from fusefs again.  Instead,
> > breadn_flags returns 0 without ever calling bufstrategy, from which I
> > infer that getblkx returned a block with B_CACHE set.  Despite that,
> > at least one of the requested pages in vfs_bio_getpages fails the
> > vm_page_all_valid(ma[i]) check.  Debugging further is wandering
> > outside my areas of expertise.  Could somebody please give me a tip?
> > What is supposed to mark those pages as valid?  Are there any other
> > undocumented conditions needed to use vn_io_fault_uiomove that msdosfs
> > and nfscl just happened to already meet?
> >
> > Grateful for any help,
> > -Alan
> >
> > [^1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=238340
> > [^2] https://github.com/freebsd/freebsd-src/commit/2aa3944510b50cbe6999344985a5a9c3208063b2
> > [^3] https://github.com/freebsd/freebsd-src/commit/ddfc47fdc98460b757c6d1dbe4562a0a339f228b
>
> Perhaps first you need to confirm that vfs_bio_getpages() sees a buffer
> with B_CACHE set but some pages still not fully valid (except the last
> page that is allowed to have invalid tail at EOF).
>
> When buffer strategy read returns, vfs_vmio_iodone() updates the pages
> validity bitmap according to the b_resid field of the buffer.  Look there,
> might be you did not set it properly.  Practically, both b_resid and b_bcount
> must be correctly set after io.
>
> In fact, after writing out all that, I realized that I am confused
> by your question. vn_io_fault_uiomove() needs to be used from
> VOP_READ/VOP_WRITE. If filesystem utilizes buffer cache, then there is
> a VOP_STRATEGY() implementation that fullfils the buffer cache requests
> for buffers reads and writes. VOP_READ and VOP_STRATEGY simply occurs
> at very different layers of the io stack. Typically, VOP_READ() does
> bread() which might trigger VOP_STRATEGY() to get the buffer, and then
> it performs vn_io_fault() to move data from locked buffer to userspace.
>
> The fact that your addition of vn_io_fault breaks something in VOP_STRATEGY()
> does not make sense.

Ahh, that last piece of information is useful.  In fusefs, both
VOP_READ and VOP_STRATEGY can end up going through the same path,
depending on cache settings, O_DIRECT, etc.  And during a VOP_STRATEGY
read, fusefs needs to use uiomove in order to move data not into the
user's buffer, but from the fuse daemon into the kernel.  But that's
not the cause of the livelock, because whether I use uiomove or
vn_io_fault_uiomove during VOP_STRATEGY I still get the same livelock.
I'll check b_bcount and b_resid tomorrow.
-Alan