UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY
dillon at apollo.backplane.com
Tue Sep 30 19:00:31 UTC 2008
:The topic of BIO_FLUSH is something I got to thinking about last night
:at work; the only condition where a disk with write caching enabled
:*would not* fully write the data to the platter would in fact be power
:loss. All other conditions (specifically soft reset and panic) should
:not require explicit flushing.
:I wonder why this is being done, especially on shutdown of FreeBSD.
:Assuming I understand it correctly, I'm talking about this:
:Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
:Waiting (max 60 seconds) for system process `syncer' to stop...
:Syncing disks, vnodes remaining...3 3 3 2 2 0 0 done
:All buffers synced.
:| Jeremy Chadwick jdc at parodius.com |
BIO_FLUSH and "Syncing disks, vnodes ..." are two different things,
so I'm not sure of the context but I will describe issues with both.
BIO_FLUSH commands the disk firmware to flush out any dirty buffers in
its drive cache. That is, writes that you have *already* issued to
the drive and which returned completion, but which have not actually
made it to the physical media yet. This is different from dirty buffers
still being maintained by the kernel which have not yet been sent to
the drive. (Just repeating this so the definition is clear to all
So, yes, you would want to do a BIO_FLUSH before powering down a
machine (halt -p) to ensure that all the dirty data you sent to the
disk actually gets to the platter.
I think you also want to issue it for a soft reset. It would not
effect a SATA drive but it certainly would effect a USB drive powered
from the computer. USB ports will be powered down during a soft
reset. BIO_FLUSH isn't likely to cause problems during a crash, unlike
flushing the buffer cache.
Some people may remember earlier versions of Windows XP often powered
the machine down before the hard drive managed to write all of its data
to the platter. Sometime that would even destroy sectors on the drive.
We know bad things happen if we don't issue the command, so best not to
take chances by making assumptions.
The "Syncing disks, vnodes ..." is the kernel flushing out any dirty
data in the buffer cache which has not yet been sent to the disk
This is more problematic. Filesystems such as HAMMER (and presumably
ZFS) absolutely do NOT want the system to flush dirty buffers unless
they explicitly give permission to do so, because the dirty buffers
might represent data for which the recovery information has not yet
been written out, and thus can corrupt the filesystem on-media if a
crash were to occur right then.
In HAMMER's case I enchanced the bioops a bit to allow HAMMER to veto
write-outs initiated by the system. sync_on_panic is irrelevant,
the buffers will not be synced without HAMMER's permission and it
won't give it.
There is also the very real general case where a traditional filesystem
such as UFS must peform multiple buffer cache ops, dirtying multiple
buffer cache buffers, in order to complete an operation. If a crash
were to occur right in the middle of such a sequence the kernel would
wind up writing dirty buffers related to incomplete operations to the
media, resulting in corruption.
In the case of softupdates one is presented with a conundrum. If you
don't write out the buffer cache during a crash you stand to lose a lot
more then 60 seconds worth of changes due to deep dependancy chains.
One 'sync' doesn't do the job and even though it is supposed to get all
the primary data and meta-data onto the disk and just leave the bitmap
updates for background operations it doesn't always seem to do that.
The softupdates code is very fragile.
On the other hand, if you *DO* try to write out the buffer cache during
a crash you have a good chance of deadlocking the system or
double-panicing, resulting in inconsistencies on the media, and you
risk doing a partial write out also resulting in inconsistencies on the
Here is example: How does the crash code deal with dirty but locked
buffer cache buffers? Say you have a softupdates filesystem and through
the course of operations you dirty a dozen buffers, then a crash occurs
while you are in the middle of ANOTHER softupdates operation which is
holding several buffers already dirtied by previous operations locked.
What happens now if the crash code tries to sync the buffer cache? Will
it sync the previously dirtied buffers that are currently locked? Will
it sync the ones that haven't been locked but skip the ones that are
locked? You lose both ways. There is no way to safely sync ANYTHING,
whether locked or not, without risking unexpected softupdates
inconsistencies on-media. This alone makes background fsck problematic
<dillon at backplane.com>
More information about the freebsd-stable