Re: bio re-ordering

From: Warner Losh <imp_at_bsdimp.com>
Date: Sat, 29 Jan 2022 05:32:02 UTC
On Fri, Jan 28, 2022 at 9:38 PM Konstantin Belousov <kostikbel@gmail.com>
wrote:

> On Sat, Jan 29, 2022 at 03:29:39PM +1100, peterj@freebsd.org wrote:
> > I'm working on a GEOM Gate network client to better handle high-latency
> > connections and have some questions regarding bio ordering assumptions
> > (alternatively, how much should I be able to re-order bio requests
> without
> > breaking things).  Within geom_gate, an incoming bio request is retrieved
> > from the kernel using a G_GATE_CMD_START ioctl, processed in userland
> > (typically by forwarding it to a remote system) and then returned via a
> > G_GATE_CMD_DONE ioctl.  My GEOM Gate client can reorder requests quite
> > aggressively and I suspect it's breaking some kernel assumptions
> regarding
> > bio behaviour.  The following questions assume that BIO_READ, BIO_WRITE
> and
> > BIO_FLUSH are valid but BIO_DELETE isn't supported.
> >
> > a) In the absence of BIO_FLUSH operations, what (if any) are the limits
> on
> >    reordering operations?  Given a block that initially contains A,
> followed
> >    by a write B, read and write C, is there any constraint on which
> content
> >    the read returns?
> There are no limits.  Either other software entities, or hardware itself,
> can process requests in arbitrary order.  This is why things are typically
> done in the completion handler, and part of the reason why the complexity
> of UFS SU exists.
>
> >
> > b) Are individual BIO_READ and BIO_WRITE operations expected to be atomic
> >    with respect to other BIO_WRITE operations?  Give 2 adjacent blocks
> that
> >    initially contain AB, and successive write CD, read and write EF
> >    operations to those blocks, is it expected that the read would return
> CD
> >    (or maybe AD or EF, assuming that's valid from the previous question)
> or
> >    could the write operations partially complete in different orders,
> >    resulting in something like AD, CF, EB etc?
> No.  At very least, underlying entities can split request into several,
> each of which is ordered individiually.  Typically, it is higher-level
> code that ensures that there are no concurrent modifications of the same
> block.  For instance, we exclusively lock vnodes and buffers around
> metadata updates.  Similarly, we lock buffers until the data is written
> to the device.
>
> >
> > b) I assume that a BIO_FLUSH should not return DONE until all preceeding
> >    write operations have completed issued.  Is it required that write
> >    operations issued after the BIO_FLUSH must not complete before the
> >    BIO_FLUSH completes?
> UFS SU relies on BIO_FLUSH being the full barrier.
>

I think that ufs relies on two ordering primitives, both marked with
BIO_ORDERED today.
That's what most of the drivers key off of. We always set BIO_ORDERED on
all the BIO_FLUSH
events as far as I Can tell.

Also, anything that sets a B_BARRIER at the upper layers, also gets
BIO_ORDERED added
to it. b*barrierwrite() sets this, and that's used in the ffs_alloc code.

Warner