Google SoC idea
Richard Coleman
rcoleman at criticalmagic.com
Wed Jun 8 01:18:37 GMT 2005
Scott Long wrote:
> /me jumps up and down and waves his hands
>
> The problem with journalling at the block layer is that you pretty much
> become forced to journal metadata and data, since the block layer really
> doesn't know the distinction, and definitely not in a
> filesystem-independent way (yes, UFS does evil things to the buffer
> cache by representing metadata with negative block numbers, but that is
> just UFS). Full journalling has many drawbacks from the viewpoint of
> speed and complexity, of course. So you really want to be able to do
> just metadata journalling.
>
> Another hard part of distinguishing between metadata and data is that
> filesystems have a habit of migrating disk blocks from holding metadata
> to holding data, and vice versa (think indirect pointer blocks, not
> inode blocks). If you are only replaying metadata, you want to make
> sure that you don't smash data blocks with old metadata.
>
> Coming up with a filesystem independent way to represent all of this for
> the block layer is not easy. Filesystems would have to be able to be
> modified to provide proper metadata vs. data hints to the block layer.
> And if you're going to do that, then why not just make it a library in
> VFS, like what Darwin does?
>
> The UFS Journalling work is already well underway, and I expect it to
> follow the path of being a VFS library. Note that I'm saying 'library'
> here, not 'layer'. There really is no way to make journalling work with
> an arbitrary filesystem 'for free', whether as a VFS layer or a GEOM
> transform, since journalling is 100% dependent on the filesystem working
> with the buffer-cache to do sane operations in a defined in order.
>
> An alternate SoC project that would be very useful is block-level
> snapshots. I'm not sure if I'll be able to retain the filesystem
> snapshot functionality in UFS with journalling enabled, so moving to
> doing the snapshots in the block layer would be a good way to make up
> for this. Beware that while the GEOM transform would be pretty
> straight-forward to write, the real trick comes from being able to make
> the consumer of a block device (a filesystem, maybe) flush itself to a
> consistent state while the snapshot is being taken. The infrastructure
> for this is the part that is very interesting, but also the most work.
>
> Scott
Scott,
Have you looked at the journaling layer that Matt has been adding to
DragonflyBSD? What you are talking about appears very similar. Or am I
misunderstanding something?
Richard Coleman
rcoleman at criticalmagic.com
More information about the freebsd-hackers
mailing list