Does UFS2 send BIO_FLUSH to GEOM when update metadata (with softupdates)?

Sat Nov 26 08:04:55 UTC 2011

Hello, Kirk.
You wrote 26 ноября 2011 г., 11:25:13:

> You are entirely correct when you say that the requirement for
> SU and SU+J is that it requires that notification of a disk-write
> complete mean that the data is on the disk (stable). The problem
> that arises is that (apparently) some tag-queue implementations
> report back that tags have been written when in fact they have
> not been written.
  Or any GEOM implements write cache. Please, don't forget, that now
 FS doesn't ask disk driver to write block, it asks GEOM stack, which
 could be composed from several nodes, located on several physically
 independent computers (don't forget about geom_gate, iSCSI, etc).
 Or any hardware implements big write cache, too.

   Every HDD or controller will report not-queued write as complete
 after copying data into cache (if WC is enabled). And even if cache
 is baked by battery, nobody promise to flush it in proper (from SU
 point of view) order. And even worse, if cache is not battery-backed
 (but server itself IS), or its flush depends on drivers (GEOM case),
 and here is system crash.

> I believe that they only way to ensure that a tagged request is
> on stable store is to send a BIO_BARRIER request to the disk. The
> BIO_BARRIER request is not supposed to return until all I/O
> requests that were sent down prior to the BIO_BARRIER have been
> committed to stable store.
  IMHO, idea with per-request flag, which driver will translate into
 appropriate device flags (may be, in barrier, but maybe not --
 depends on device capabilities) is much better. BIO_BARRIER will
 flush ALL write cache by design. It is barrier, and it hasn't any
 references to previous requests, it is flush-them-all request. It
 could be HUGE performance impact, if you will flush large write cache
 of controller every 100ms. But if SU/fsync()/O_SYNC requests will be
 marked with special flag, GEOM stack and controller will be able to
 process these requests separately on one hand, and will not flush
 cache on timer basis, on other, if it is possible. Maybe, on some
 hardware, it will have same effect as barrier, but I'm sure, that
 there IS hardware, which could handle such requests much more
 effectively, that full cache flush. And, yes, GEOM too. Again, I, as
 maintainer of geom_raid5, know how vital to have good cache in this
 module (with some requests reside in it for tens of secinds!), and I
 don't see any way to implement barrier, but flush cache on each
 barrier -- which effectively disable cache at all.

-- 
// Black Lion AKA Lev Serebryakov <lev at serebryakov.spb.ru>