Does UFS2 send BIO_FLUSH to GEOM when update metadata (with softupdates)?

Kostik Belousov kostikbel at gmail.com
Sat Nov 26 08:03:57 UTC 2011


On Fri, Nov 25, 2011 at 11:25:13PM -0800, Kirk McKusick wrote:
> Kostik,
> 
> You are entirely correct when you say that the requirement for
> SU and SU+J is that it requires that notification of a disk-write
> complete mean that the data is on the disk (stable). The problem
> that arises is that (apparently) some tag-queue implementations
> report back that tags have been written when in fact they have
> not been written. 
Right, and my belief that real hardware is not much affected,
except probably some ultra-cheap and old ATA disks. Another issue
is broken-by-design 'drivers' which authors do not understand the
environment they programming for.

> 
> I believe that they only way to ensure that a tagged request is
> on stable store is to send a BIO_BARRIER request to the disk. The
> BIO_BARRIER request is not supposed to return until all I/O
> requests that were sent down prior to the BIO_BARRIER have been
> committed to stable store.
> 
> If in fact the disk hardware lies about tag completion, my
> proposed way for SU and SU+J to use BIO_BARRIER is to send
> one down periodically (say every 100ms) and then defer processing
> any I/O completions from before the barrier request until the
> BIO_BARRIER completes. Since most SU activity is going on in
> background, the delay should not be too noticable. The main
> place where it would likely show up is in fsync which could be
> delayed for up to the period (100ms). For such cases, the fsync
> should issue its own BIO_BARRIER once it has initiated all of
> its required I/O.

I do not see how this proposal change much, except limiting potential
havoc to the last 100ms of system operation. In fact, reordering,
besides causing fs consistency problems, may cause the security issues
as well [*]. If user data is written into the reused blocks, but
metadata update was ordered after data write, we can end with the
arbitrary override of the sensititive authorization or accounting
information.

* I expect some people going wonky at this point and immediately remove
corresponding driver from ports.

P.S. Sorry for being grumpy.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20111126/bd19b50b/attachment.pgp


More information about the freebsd-fs mailing list