svn commit: r326731 - head/sys/ufs/ffs

Warner Losh imp at bsdimp.com
Wed Dec 20 22:31:10 UTC 2017


On Wed, Dec 20, 2017 at 1:32 PM, Mark Johnston <markj at freebsd.org> wrote:

> On Sat, Dec 09, 2017 at 10:14:50PM -0500, Mark Johnston wrote:
> > On Sat, Dec 09, 2017 at 07:36:59PM -0700, Warner Losh wrote:
> > > On Sat, Dec 9, 2017 at 11:03 AM, Andriy Gapon <avg at freebsd.org> wrote:
> > >
> > > > On 09/12/2017 17:44, Mark Johnston wrote:
> > > > > Some GEOMs do not appear to handle BIO_ORDERED correctly, meaning
> that
> > > > the
> > > > > barrier write may not work as intended.
> > >
> > >
> > > There's a few places we send down a BIO_ORDERED BIO_FLUSH command
> > > (see softdep_synchronize for one). Will those matter?
> >
> > Some classes have separate handling for BIO_FLUSH, so it depends. I
> > think gmirror's handling is buggy independent of BIO_ORDERED:
> > g_mirror_start() sends BIO_FLUSH commands directly to the mirrors,
> > while reads and writes are queued for handling by the gmirror worker
> > thread. So as far as I can tell, a BIO_WRITE which arrives at the
> > gmirror provider before a BIO_FLUSH might be sent to the mirrors
> > after that BIO_FLUSH. I would expect BIO_FLUSH to implicitly have
> > something like release semantics, i.e., a BIO_FLUSH shouldn't be
> > reordered with a BIO_WRITE that preceded it. But I might be
> > misunderstanding.
> >
> > > As I've noted elsewhere: I'd really like to kill BIO_ORDERED since it
> has
> > > too many icky effects (and BIO_FLUSH + BIO_ORDERED isn't guaranteed to
> do,
> > > well, anything since it can turn into a NOP in a number of places. Plus
> > > many of the implementations of BIO_ORDERED assume the drive is like
> SCSI
> > > and you just set the right tag to make it 'ordered'. For ATA we issue
> a non
> > > NCQ command, which is a full drain of outstanding commands, send this
> > > command, then start them again which really shuts down the parallelism
> we
> > > implemented NCQ for :(. We do similar for NVME which is even worse.
> There
> > > we have multiple submission queues in the hardware. To simulated it,
> we do
> > > a similar drain, but that's going to get in the way as we move to NUMA
> and
> > > systems where we try to do the I/O entirely on one CPU (both
> submission and
> > > completion) and ordered I/O is guaranteed lock contention.
> >
> > Independent of this, it doesn't really look like we have any way of
> > handling write errors when dependencies are enforced using BIO_ORDERED.
> > In the case of the babarrierwrite() consumer in FFS, what happens if the
> > inode block write fails due to a transient error, but the following CG
> > update succeeds?
>
> Looking at this some more, I think iosched is ok since bioq_disksort()
> does in fact handle BIO_ORDERED.


Except when it doesn't. BIO_READs are posted to the device completely
independent of BIO_ORDERED commands. I believe this is OK, but since
BIO_ORDERED is so ill defined, I can't be sure.


> I'm concerned though that there are
> places in the tree where we should be using BIO_ORDERED but aren't.
>

I maintain that there are no such places :) If you want to do ORDERED I/O,
you have to order it yourself by flushing all the other outstanding I/O,
then issuing the I/O you want to follow the completion of all the others,
then issue new I/O. With tagged queueing, otherwise, you get
non-determinate write order to the disk.


> gmirror metadata writes for instance are performed synchronously but
> should not be reordered with preceding BIO_WRITEs.
>

Right, gmirror should freeze the software bioq, drain the hardware queue,
issue the right, then unfreeze its bioq in these cases. BIO_ORDERED and
BIO_FLUSH isn't going to help much. It assumes there's only one 'flow' of
these to the device from the upper layers, when in fact there can be
several for multiple partitions. If gmirror needs strong ordering
semantics, it needs to actually enforce the exact semantics it wants.
Otherwise, tagged queueing can get in the way (though right now, to its
credit, the ATA driver schedules the ordered I/O as non-tagged-queue so
that it causes this mini-freeze, SCSI just tags it and hopes for the best,
giving slightly different semantics, and NVMe I think drains all queue,
then schedules the I/O then starts again). NVMe is an even poorer match to
whatever BIO_ORDERED is supposed to mean than even ATA since it can do
multiple streams to the drive at once. In that environment, BIO_ORDERED
makes as much sense as SPLHIGH does on a MP machine.


> Anyway, I've posted a review for some gmirror I/O ordering fixes here if
> anyone is interested: https://reviews.freebsd.org/D13559
>

I'll take a look at it, thanks.

Warner


More information about the svn-src-head mailing list