SU+J systems do not fsck themselves
David Thiel
lx at redundancy.redundancy.org
Tue Dec 27 22:20:13 UTC 2011
I've had multiple machines now (9.0-RC3, amd64, i386 and earlier
9-CURRENT on ppc) running SU+J that have had unexplained panics and
crashes start happening relating to disk I/O. When I end up running a
full fsck, it keeps turning out that the disk is dirty and corrupted,
but no mechanism is in place with SU+J to detect and fix this. A bgfsck
never happens, but a manual fsck in single-user does indeed fix the
crashing and weird behavior. Others have tested their SU+J volumes and
found them to have errors as well. This makes me super nervous.
Basically, the way SU+J seems to operate is this:
http://redundancy.redundancy.org/fscklog2
"Oh hey, I see you shut down uncleanly, let's check everything looks
good, off you go, whee"
Until I actually go and fsck, when I get:
http://redundancy.redundancy.org/fscklog1
So, I understand that journalling doesn't replace the need for a
potential fsck (though I never had this problem with gjournal), but
without a way for the system to detect that a fsck is necessary, this
seems pretty much a guaranteed recipe for data corruption, and seems to
offer little to no benefit over plain SU+fsck, or even just mounting
async.
So: is everyone else seeing this? Am I misunderstanding how SU+J should
be used? How should the error resolution process really happen?
Thanks,
David
More information about the freebsd-current
mailing list