Does UFS2 send BIO_FLUSH to GEOM when update metadata (with
softupdates)?
Lev Serebryakov
lev at freebsd.org
Sat Nov 26 08:04:55 UTC 2011
Hello, Kirk.
You wrote 26 ноября 2011 г., 11:25:13:
> You are entirely correct when you say that the requirement for
> SU and SU+J is that it requires that notification of a disk-write
> complete mean that the data is on the disk (stable). The problem
> that arises is that (apparently) some tag-queue implementations
> report back that tags have been written when in fact they have
> not been written.
Or any GEOM implements write cache. Please, don't forget, that now
FS doesn't ask disk driver to write block, it asks GEOM stack, which
could be composed from several nodes, located on several physically
independent computers (don't forget about geom_gate, iSCSI, etc).
Or any hardware implements big write cache, too.
Every HDD or controller will report not-queued write as complete
after copying data into cache (if WC is enabled). And even if cache
is baked by battery, nobody promise to flush it in proper (from SU
point of view) order. And even worse, if cache is not battery-backed
(but server itself IS), or its flush depends on drivers (GEOM case),
and here is system crash.
> I believe that they only way to ensure that a tagged request is
> on stable store is to send a BIO_BARRIER request to the disk. The
> BIO_BARRIER request is not supposed to return until all I/O
> requests that were sent down prior to the BIO_BARRIER have been
> committed to stable store.
IMHO, idea with per-request flag, which driver will translate into
appropriate device flags (may be, in barrier, but maybe not --
depends on device capabilities) is much better. BIO_BARRIER will
flush ALL write cache by design. It is barrier, and it hasn't any
references to previous requests, it is flush-them-all request. It
could be HUGE performance impact, if you will flush large write cache
of controller every 100ms. But if SU/fsync()/O_SYNC requests will be
marked with special flag, GEOM stack and controller will be able to
process these requests separately on one hand, and will not flush
cache on timer basis, on other, if it is possible. Maybe, on some
hardware, it will have same effect as barrier, but I'm sure, that
there IS hardware, which could handle such requests much more
effectively, that full cache flush. And, yes, GEOM too. Again, I, as
maintainer of geom_raid5, know how vital to have good cache in this
module (with some requests reside in it for tens of secinds!), and I
don't see any way to implement barrier, but flush cache on each
barrier -- which effectively disable cache at all.
--
// Black Lion AKA Lev Serebryakov <lev at serebryakov.spb.ru>
More information about the freebsd-fs
mailing list