gjournal public alpha release

Wed Aug 3 16:39:04 GMT 2005

Hi!

I'm announcing the first public version of the gjournal GEOM class :)
The code is here: http://ivoras.sharanet.org/gjournal.tgz, together with
a README file (reproduced below).

I'd like to hear as many testing and bug reports as possible :)

----
The README file:

What is it
----------

It's a journaling layer in GEOM subsystem. The intention is to provide
devices (on which maybe filesystems are hosted) with data journaling
capabilities.

This is my first geom class, and also my first significant piece of
kernel programming, so there are bound to be errors from my inexperience.
The code is tested though, and it shouldn't crumble too often. :)

More information is available at:
http://wikitest.freebsd.org/moin.cgi/gjournal

What does it do
---------------

gjournal connectes ("consumes") two devices - one is the "data device"
that is the target for journaling, and the other is "journal device" on
which data is journaled. For every write request, its data is written on
the journal device, and after some time transferred to the data device.

Why use it
----------

The principal benefit of this is that the writes to the journal device
are done sequentially and are much faster than direct, scattered writes
to the data device.

Another benefit, not implemented yet, is that it can be used in a
"delayed-commit" mode, aka "Copy-on-write", where the data is stored in
journal but not automatically commited to data device. This allows for
dangerous experimenting on the data (maybe filesystem, with fsck), and
then deciding later whether to commit the changes to data device or
discard them.

What works in this version
--------------------------

This is alpha version software. Don't use in production setup.

* journaling with automatic commit
* automatic recovery of the journal on crash

I've tested it by hosting a filesystem on the journaled device and
copying various files to and from, so it should work well at least
for light loads without panicking the kernel when you look at it :)

Notes
~~~~~

* Since each and every write request is recorded verbatim in the journal,
   together with some system data (overhead), the journal device should
   be big. Based on current preliminary testing, I'd recommend something
   like 500MB to 1GB.
* Making the journal device a md(4) device backed by a file should work,
   but it's not tested and there could be problems with crash recovery.
* Data and journal devices can be on different physical devices, for
   added speed.
* There are some design oddities, like blocking all IO on the device
   while the entire journal is commited, that won't go away soon, but
   probably will in a later version.

How to use it
-------------

Here's an example, step-by-step:

* Unpack the archive, chdir to resulting directory
* `make`
* Symlink resulting .ko file into /boot/kernel/
* `make so`
* Symlink resulting .so file into /lib/geom/
* `./gjournal load`
* `./gjournal label mydevice /dev/datadevice /dev/journaldevice`
* Use resulting /dev/journaled/mydevice for testing
* `./gjournal unload`

Notes
~~~~~

* It's developed for 5.4-RELEASE but it should work on later
   versions.
* You need full system sources present in the usual location
   (/usr/src) to build it.
* You need kernel with INVARIANTS and INVARIANTS_SUPPORT to run it
   (or you can modify the Makefile not to define those)
* This is alpha quality software. Do not use on production machines
   and/or data.
* I'd like to hear as many reports of testing as possible. If it crashes,
   I'd appreciate receiving following data:
   - what you wanted to do
   - what you did (e.g. commands you executed)
   - the configuration of your devices (data and journal devices)
   - is it repeatable?
   - if it's repeatable, set kern.geom.journal.debug sysctl to 20, and
     send as much of the last part of the kernel log to me

Benchmarks
----------

I did some quick preliminary (and thus non-scientific and non-conclusive)
benchmarks, and the results are good:

"tar x" = untarring of a (previously cached) tar archive containing
           /usr/src/sys tree
"rm -rf" = doing rm -rf on the untarred tree
"raw" = partition without gjournal
"gj" = the same partition gjournal-ed on another parition on the same
        drive
"SU" = softupdates
"normal" and "sync" are mount methods for UFS

(numbers are seconds)

  Type       | tar x  | rm -rf
----------------------------------
  raw, sync  | 25.0   | 8.5
  raw, normal| 18.9   | 9.6
  raw, SU    | 17.9   | 0.6
  gj, sync   | 23.0   | 7.9
  gj, normal | 11.9   | 8.2

These are results in the best case for gjournal, where there's no
journal commit phase in the middle of benchmarking. Commit delay can
be configured with kern.geom.journal.commit_delay sysctl (in seconds).

Unfortunately, the best result (gj, normal) is not crash-resistent.
Though the journalling appears to be sound, it seems that FFS/UFS
in the "normal" mode (metadata synchronous, data asynchronous)
doesn't keep the filesystem always consistent with the writes, so a
crash does require fsck on the filesystem. It's maybe also true for
"sync" mode only harder to provoke. I'd appreciate any help to
explain or solve this, but at the current time it means that this setup
(gjournal + UFS) is NOT viable as a replacement for journaling
filesystem.

Acknowledgments
---------------

This work is sponsored by Google via Summer of Code project. Menthors are
Poul-Henning Kamp <phk at FreeBSD.org> and Pawel Jakub Dawidek 
<pjd at FreeBSD.org>.

-- 
Every sufficiently advanced magic is indistinguishable from technology
    - Arthur C Anticlarke