SU: Could an unclean shutdown cause a file with outstanding writes to become sparse after fsck?

Don Lewis truckman at FreeBSD.org
Mon Apr 16 21:40:11 UTC 2012


On 16 Apr, Ryan Stone wrote:
> Today I encountered a system running a very old version of FreeBSD
> (6.1-ish) that was stuck in a reboot loop.  I was eventually able to
> discover that the system was running into a long-since fixed bug where
> the system would panic if you tried to execute a sparse file.  From
> what I've been able to get from the owner of this system, it sounds
> like the machine reset during a system upgrade.  I suspect that the
> initial reset was unrelated (a different long-since fixed panic or a
> power loss, maybe), and that some executables that had outstanding
> writes before the reset ended up becoming sparse when fsck was run.
> Is this possible?  The filesystem was running soft-updates, and I'm
> really not familiar enough with either soft-updates or even the UFS
> on-disk metadata to say whether this is reasonable.

Yes an unclean shutdown can cause a new file that is being written to
become sparse, especially if you are using tagged commands on SCSI or
NCQ with SATA and write caching disabled.  I've seen it happen.  Even if
the file is being written sequentially, there is no guarantee that the
drive will actually write the data and report the write completions in a
sequential manner.  With SU, the block pointers for the file won't get
written until the drive reports the writes are complete, so if there is
an unclean shutdown, some of the block pointers may still be zero,
creating a sparse file.  At least all of the block pointers that are
present will point to valid data.

This particular problem would probably not occur if write caching was
enabled and the unclean shutdown was caused by a system panic because
all of the data would probably be in the drive's write cache and would
eventually get written even after the crash.

If write caching is enabled, then all bets are off if there is a power
failure because the unwritten contents of the drive's write cache would
be lost.  Some of the file's block pointers could be pointing to random
garbage, and there could be unconsistencies that fsck can't
automatically fix.  This would require a manual fsck and can cause data
loss.

When install is invoked with the -S flag, it should probably call
fsync() on the destination file after it is done writing and before it
used rename() to replace the target file.

  


More information about the freebsd-fs mailing list