Google SoC idea

Richard Coleman rcoleman at criticalmagic.com
Wed Jun 8 01:18:37 GMT 2005


Scott Long wrote:
> /me jumps up and down and waves his hands
> 
> The problem with journalling at the block layer is that you pretty much 
> become forced to journal metadata and data, since the block layer really 
> doesn't know the distinction, and definitely not in a 
> filesystem-independent way (yes, UFS does evil things to the buffer 
> cache by representing metadata with negative block numbers, but that is 
> just UFS).  Full journalling has many drawbacks from the viewpoint of 
> speed and complexity, of course.  So you really want to be able to do 
> just metadata journalling.
> 
> Another hard part of distinguishing between metadata and data is that 
> filesystems have a habit of migrating disk blocks from holding metadata 
> to holding data, and vice versa (think indirect pointer blocks, not 
> inode blocks).  If you are only replaying metadata, you want to make 
> sure that you don't smash data blocks with old metadata.
> 
> Coming up with a filesystem independent way to represent all of this for 
> the block layer is not easy.  Filesystems would have to be able to be 
> modified to provide proper metadata vs. data hints to the block layer. 
> And if you're going to do that, then why not just make it a library in 
> VFS, like what Darwin does?
> 
> The UFS Journalling work is already well underway, and I expect it to 
> follow the path of being a VFS library.  Note that I'm saying 'library' 
> here, not 'layer'.  There really is no way to make journalling work with 
> an arbitrary filesystem 'for free', whether as a VFS layer or a GEOM 
> transform, since journalling is 100% dependent on the filesystem working 
> with the buffer-cache to do sane operations in a defined in order.
> 
> An alternate SoC project that would be very useful is block-level 
> snapshots.  I'm not sure if I'll be able to retain the filesystem 
> snapshot functionality in UFS with journalling enabled, so moving to 
> doing the snapshots in the block layer would be a good way to make up 
> for this.  Beware that while the GEOM transform would be pretty 
> straight-forward to write, the real trick comes from being able to make 
> the consumer of a block device (a filesystem, maybe) flush itself to a 
> consistent state while the snapshot is being taken.  The infrastructure 
> for this is the part that is very interesting, but also the most work.
> 
> Scott

Scott,

Have you looked at the journaling layer that Matt has been adding to 
DragonflyBSD?  What you are talking about appears very similar.  Or am I 
misunderstanding something?

Richard Coleman
rcoleman at criticalmagic.com


More information about the freebsd-hackers mailing list