skipping fsck with soft-updates enabled

Oliver Fromme olli at lurza.secnetix.de
Thu Jan 11 08:41:57 UTC 2007


Scott Oertel wrote:
 > I'll probably do some testing with the effects of delaying fsck for long 
 > periods of time using soft-updates. Personally, I haven't found anyone 
 > stating any hard facts that would leave me to believe that running on a 
 > dirty filesystem for an extended period of time won't cause further 
 > inconsistencies.

s/further//

If soft-updates is running correctly on the drive, then
there are _no_ inconsistencies on the file systems after
a crash.  And there's no reason why any inconsistencies
should appear later on.

The only thing that's "dirty" about the file systems is
unused space that's still marked as used, which means
that it is not available to new allocations when writing
data.  That's not harmful (unless you run out of disk
space, of course).

The only thing that fsck will do in that situation -- no
matter whether regular fsck or background fsck -- is to
find those unused areas and mark them as free.  It does
not matter whether that's done immediately after the
reboot, or a few hours later, or a month later, or even
never at all.  The only drawback is that some disk space
is unavailable for new allocations until fsck cleans it
up.

All of the above is theory, and it _should_ work exactly
like that.  In practice, every non-trivial piece of code
contains bugs.  In practice, disk drives don't always
behave as the driver expects:  because of misconfiguration
(e.g. enabling write-cache on drives without support for
tagged command queueing), or because of bugs in the firm-
ware, misunderstanding of the specs, or even intentional
deviations from the spec by the vendor (which isn't all
that unusual, unfortunately).

Furthermore, if the crash is caused by hardware failure
(e.g. power outage, pulling the plug, kicking the hard
drive, disk head crash etc.), then _no_ piece of software
can guarantee anything about the state of the filesystem.
A full, regular fsck (non-background) is required in such
cases, and even then there is no guarantee that you don't
have corrupted files.  The problem is that the code doesn't
seem to be able to detect such cases reliably.  Another
cause of trouble is when the background fsck is interrupted
in the middle by another crash.  In my experience that's
almost always guaranteed to cause serious corruption.

 > Which was what I was hoping to get out of this post, maybe someone will 
 > read it down the line and provide some real facts of why it is or is not 
 > dangerous to delay fsck's for an extended period of time.

Well, above I provided some real facts and explained
some potential risks.  But it's up to yourself to decide
whether it could be dangerous in your situation or not.

Personally, it's my impression that pjd's new gjournal
code -- even though it's still considered experimental --
seems to be more reliable than background fsck.  It
costs a bit of I/O performance, though, but if you put
the journal on a dedicated disk, it's not that bad.

Best regards
   Oliver

-- 
Oliver Fromme,  secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing
Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
Any opinions expressed in this message may be personal to the author
and may not necessarily reflect the opinions of secnetix in any way.

"What is this talk of 'release'?  We do not make software 'releases'.
Our software 'escapes', leaving a bloody trail of designers and quality
assurance people in its wake."


More information about the freebsd-fs mailing list