Re: bio re-ordering
- Reply: Peter Jeremy : "Re: bio re-ordering"
- In reply to: Konstantin Belousov : "Re: bio re-ordering"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sat, 29 Jan 2022 05:32:02 UTC
On Fri, Jan 28, 2022 at 9:38 PM Konstantin Belousov <kostikbel@gmail.com> wrote: > On Sat, Jan 29, 2022 at 03:29:39PM +1100, peterj@freebsd.org wrote: > > I'm working on a GEOM Gate network client to better handle high-latency > > connections and have some questions regarding bio ordering assumptions > > (alternatively, how much should I be able to re-order bio requests > without > > breaking things). Within geom_gate, an incoming bio request is retrieved > > from the kernel using a G_GATE_CMD_START ioctl, processed in userland > > (typically by forwarding it to a remote system) and then returned via a > > G_GATE_CMD_DONE ioctl. My GEOM Gate client can reorder requests quite > > aggressively and I suspect it's breaking some kernel assumptions > regarding > > bio behaviour. The following questions assume that BIO_READ, BIO_WRITE > and > > BIO_FLUSH are valid but BIO_DELETE isn't supported. > > > > a) In the absence of BIO_FLUSH operations, what (if any) are the limits > on > > reordering operations? Given a block that initially contains A, > followed > > by a write B, read and write C, is there any constraint on which > content > > the read returns? > There are no limits. Either other software entities, or hardware itself, > can process requests in arbitrary order. This is why things are typically > done in the completion handler, and part of the reason why the complexity > of UFS SU exists. > > > > > b) Are individual BIO_READ and BIO_WRITE operations expected to be atomic > > with respect to other BIO_WRITE operations? Give 2 adjacent blocks > that > > initially contain AB, and successive write CD, read and write EF > > operations to those blocks, is it expected that the read would return > CD > > (or maybe AD or EF, assuming that's valid from the previous question) > or > > could the write operations partially complete in different orders, > > resulting in something like AD, CF, EB etc? > No. At very least, underlying entities can split request into several, > each of which is ordered individiually. Typically, it is higher-level > code that ensures that there are no concurrent modifications of the same > block. For instance, we exclusively lock vnodes and buffers around > metadata updates. Similarly, we lock buffers until the data is written > to the device. > > > > > b) I assume that a BIO_FLUSH should not return DONE until all preceeding > > write operations have completed issued. Is it required that write > > operations issued after the BIO_FLUSH must not complete before the > > BIO_FLUSH completes? > UFS SU relies on BIO_FLUSH being the full barrier. > I think that ufs relies on two ordering primitives, both marked with BIO_ORDERED today. That's what most of the drivers key off of. We always set BIO_ORDERED on all the BIO_FLUSH events as far as I Can tell. Also, anything that sets a B_BARRIER at the upper layers, also gets BIO_ORDERED added to it. b*barrierwrite() sets this, and that's used in the ffs_alloc code. Warner