svn commit: r326731 - head/sys/ufs/ffs

Sun Dec 10 03:14:58 UTC 2017

On Sat, Dec 09, 2017 at 07:36:59PM -0700, Warner Losh wrote:
> On Sat, Dec 9, 2017 at 11:03 AM, Andriy Gapon <avg at freebsd.org> wrote:
> 
> > On 09/12/2017 17:44, Mark Johnston wrote:
> > > Some GEOMs do not appear to handle BIO_ORDERED correctly, meaning that
> > the
> > > barrier write may not work as intended.
> 
> 
> There's a few places we send down a BIO_ORDERED BIO_FLUSH command
> (see softdep_synchronize for one). Will those matter?

Some classes have separate handling for BIO_FLUSH, so it depends. I
think gmirror's handling is buggy independent of BIO_ORDERED:
g_mirror_start() sends BIO_FLUSH commands directly to the mirrors,
while reads and writes are queued for handling by the gmirror worker
thread. So as far as I can tell, a BIO_WRITE which arrives at the
gmirror provider before a BIO_FLUSH might be sent to the mirrors
after that BIO_FLUSH. I would expect BIO_FLUSH to implicitly have
something like release semantics, i.e., a BIO_FLUSH shouldn't be
reordered with a BIO_WRITE that preceded it. But I might be
misunderstanding.

> As I've noted elsewhere: I'd really like to kill BIO_ORDERED since it has
> too many icky effects (and BIO_FLUSH + BIO_ORDERED isn't guaranteed to do,
> well, anything since it can turn into a NOP in a number of places. Plus
> many of the implementations of BIO_ORDERED assume the drive is like SCSI
> and you just set the right tag to make it 'ordered'. For ATA we issue a non
> NCQ command, which is a full drain of outstanding commands, send this
> command, then start them again which really shuts down the parallelism we
> implemented NCQ for :(. We do similar for NVME which is even worse. There
> we have multiple submission queues in the hardware. To simulated it, we do
> a similar drain, but that's going to get in the way as we move to NUMA and
> systems where we try to do the I/O entirely on one CPU (both submission and
> completion) and ordered I/O is guaranteed lock contention.

Independent of this, it doesn't really look like we have any way of
handling write errors when dependencies are enforced using BIO_ORDERED.
In the case of the babarrierwrite() consumer in FFS, what happens if the
inode block write fails due to a transient error, but the following CG
update succeeds?