Does UFS2 send BIO_FLUSH to GEOM when update metadata (with
softupdates)?
Pawel Jakub Dawidek
pjd at FreeBSD.org
Mon Dec 5 08:02:50 UTC 2011
On Thu, Dec 01, 2011 at 06:36:26PM -0800, Kirk McKusick wrote:
> > > No. Softupdates do not need flushes.
> >
> > Well, they do for two reasons:
> > 1. To properly handle sync operations (fsync(2), O_SYNC).
> > 2. To maintain consistent on-disk structures.
>
> SU does not use synchronous writes to maintain consistency (see below).
> It rarely uses synchronous writes even to immplement fsync. Instead it
> issues async I/O requests for all the blocks necessary to ensure that
> the inode in question is stable. It then waits until all of those writes
> have been acknowledged. When they have been acknowledged the fsync
> returns. If the subsystem acknowledges them before they are truely on
> stable store, then SU and correspondingly the caller of fsync is screwed.
>
> The one place where a synchronous write (bwrite) is used is when the
> completion of a particular I/O is needed to make progress on many other
> things (usually an update to the SUJ log) in which case SU will force a
> flush on that I/O (e.g., bwrite it) to hasten it along.
We clearly talk about different layers:) By sync writes I meant disk
writes, ie. when you send write to disk it will ack the write only when
the data hit stable storage, not disk write cache.
> > The second point is there, because BIO_FLUSH is the only way to avoid
> > reordering (apart from turning off disk write cache).
> >
> > SU assumes no I/O reordering will happen, which is very weak assumption.
>
> SU has no problems with reordering. It makes no assumptions on the
> order in which its I/O operations are done. Its only requirement is
> that it not be told that something is stable before it truely is
> stable. Because the way that it maintains consistency is to not
> issue a write on something before it knows that something that it
> depends on for consistency is in fact stable. But it has no problem
> with any of its outstanding writes being done in any particular order.
Again, different layers:) The reordering I worry about is at disk level.
When you write a bunch of blocks they are really stored in disk write
cache. Disk sends you ack eventhough the data is not on the stable
storage yet. Under assumption that previous data is safe you send more
data that is also placed in disk write cache. At some point disk decides
to flush its write cache, but it will most likely sort the blocks by
offset, which will mess your initial ordering and the data you sent last
may hit the disk first.
To avoid this reordering to happen you either need to turn off disk
write cache or send BIO_FLUSH between those two write groups.
--
Pawel Jakub Dawidek http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20111205/ca5ab04d/attachment.pgp
More information about the freebsd-fs
mailing list