Softdep journaling
Jeff Roberson
jroberson at jroberson.net
Wed Jan 20 01:46:58 UTC 2010
Hello,
Many of you may have already noticed that I have implemented a journaling
layer that co-exists with softdep to eliminate fsck after an unclean
shutdown. I have written about this here:
http://jeffr-tech.livejournal.com/
And I have a patch against current here:
http://people.freebsd.org/~jeff/suj.diff
I have been working with McKusick and he has been providing review
feedback. Tegge and kib have been reviewing my rename changes. Peter
Holm has generously provided his time for testing. I am within a week of
being able to commit this to CURRENT. I'm raising this here so people can
discuss the project and I can answer any questions or concerns before it
goes in the tree.
Briefly, I have added an intent log to softdep that journals block
allocation and free along with inode link count changes. After an unclean
shutdown a special fsck pass reads this journal and frees blocks and
inodes. The recovery pass is not like traditional block journaling as it
actually evaluates the filesystem state to determine how far along the
operation made it and rolls back intelligently.
The worst case journal recovery time I've seen is a couple of minutes,
however, I'm still generating a few hundred megabytes of text describing
the operation when I run fsck so that I can quickly resolve any bugs.
This worst case performance was generated using pho's stress2 and a
completely full 64MB journal containing nearly 2 million outstanding
records. Recovery time for a crash during buildworld, for example, is on
the order of 10 seconds even while producing the text log. Without the
log I expect the maximum on any drive to be around 2 minutes. Presently
recovery is actually cpu bound and I'm using 3 year old hardware. It
scales up with the size of the journal and down with the speed of the
processor. The size of the filesystem makes little difference.
The filesystem can not be mounted read/write until the journal is
recovered or a full fsck pass is run. The filesystem will be backwards
compatible with earlier ffs implementations. The journal can be enabled
or disable with tunefs. The only requirement is sufficient free space for
the journal which is stored in a regular inode.
The patch I have presented is mostly complete. It only lacks the recovery
operation for partial truncation. I'm still running through various
scenarios to validate the checker, however, the kernel has been very
stable as of late.
Please raise any comments or concerns here. I'm going to make another
call for testers on current@ and want to keep that reserved for bug
reports.
Thanks,
Jeff
More information about the freebsd-arch
mailing list