Does UFS2 send BIO_FLUSH to GEOM when update metadata (with softupdates)?

Lev Serebryakov lev at FreeBSD.org
Wed Nov 23 22:21:52 UTC 2011


Hello, Kostik.
You wrote 23 ноября 2011 г., 23:44:44:

> You are making wrong conclusions from the false assumptions.
  It seems to me, that FFS2/SU and SUJ are built on wrong assumption,
that complete bwrite() without ASYNC flag really means landed data on
physical device in any case. It is completely wrong, non-reliable,
and prevents from building reliable AND high-performance storage on FreeBSD
:(
  Or, may be, I understand code wrong. It is possible. See below.

> The only requirement of the SU is that writes reported as done by disk
> driver are indeed safely landed in the involatile storage.
  I've traced code and found next call chain. Please, correct me if
I'm wrong.

  Softupdates writes all data via bwrite() or bawrite(). bawrite()
is Ok, it is Async, it should givew any guarantees about immideat
cache flush or ordering.

  bwrite() calls (in most cases, and in case of GEOM backing) ends in
strategy, g_vfs_strategy() in case of GEOM (geom uses generic
bufwrite(), which tweaks some flags, does some checks and send struct
buf to bop_strategy, which is g_vfs_stratedy() in case of GEOM).

 g_vfs_strategy() sends request WITHOUT looking into ASYNC flag on
"struct buf". We have BIO_ORDERED flags, but it is not used on this codepath!

  Maybe, cheap solution will be to set BIO_ORDERED on every struct
bio, which is created for struct buf without ASYNC flag? Or it is too
strict?

 Please note: now GEOM could not guarantee even ordering of SU writing
requests now! Disk drivers, which sends such requests to hardware at
least, could queue them too or leave them in drive's cache.

   It is COMPLETELY WRONG!

    With such disconnection between top-level logic (softupdates) and
all driver stack (GEOM and disk drives) I surprised, that FFS2 could
be repaired after panic at all!

   IMHO, it should be fixed ASAP and FFS2 should notify lower layers
about writes, which is required to be ordered & landed before bwrite()
returns! We have BIO_ORDERED flag, it could be used, but if it is too
strict, we could add BIO_SYNC flag, too.

  ATA/SCSI subsystems already have proper support for BIO_ORDERED, and
adding BIO_SYNC will not a big deal on low level, also, it could be
easily added to g_vfs_strategy(), but I'm not sure that it will not
hurt performance too much -- I'm not sure, that every buf write
without ASYNC flag should be strict-SYNC. But I AM SURE, that SU/SU+J
writes MUST BE DONE STRICT SYNC.

P.S. I added geom@ into CC: as it seems to be UFS<->GEOM interaction
problem.

-- 
// Black Lion AKA Lev Serebryakov <lev at FreeBSD.org>



More information about the freebsd-fs mailing list