Google SoC idea

Scott Long scottl at
Wed Jun 8 14:16:51 GMT 2005

Eric Anderson wrote:
> Scott Long wrote:
>> Richard Coleman wrote:
>>> Scott Long wrote:
>>>> /me jumps up and down and waves his hands
>>>> The problem with journalling at the block layer is that you pretty 
>>>> much become forced to journal metadata and data, since the block 
>>>> layer really doesn't know the distinction, and definitely not in a 
>>>> filesystem-independent way (yes, UFS does evil things to the buffer 
>>>> cache by representing metadata with negative block numbers, but that 
>>>> is just UFS).  Full journalling has many drawbacks from the 
>>>> viewpoint of speed and complexity, of course.  So you really want to 
>>>> be able to do just metadata journalling.
>>>> Another hard part of distinguishing between metadata and data is 
>>>> that filesystems have a habit of migrating disk blocks from holding 
>>>> metadata to holding data, and vice versa (think indirect pointer 
>>>> blocks, not inode blocks).  If you are only replaying metadata, you 
>>>> want to make sure that you don't smash data blocks with old metadata.
>>>> Coming up with a filesystem independent way to represent all of this 
>>>> for the block layer is not easy.  Filesystems would have to be able 
>>>> to be modified to provide proper metadata vs. data hints to the 
>>>> block layer. And if you're going to do that, then why not just make 
>>>> it a library in VFS, like what Darwin does?
>>>> The UFS Journalling work is already well underway, and I expect it 
>>>> to follow the path of being a VFS library.  Note that I'm saying 
>>>> 'library' here, not 'layer'.  There really is no way to make 
>>>> journalling work with an arbitrary filesystem 'for free', whether as 
>>>> a VFS layer or a GEOM transform, since journalling is 100% dependent 
>>>> on the filesystem working with the buffer-cache to do sane 
>>>> operations in a defined in order.
>>>> An alternate SoC project that would be very useful is block-level 
>>>> snapshots.  I'm not sure if I'll be able to retain the filesystem 
>>>> snapshot functionality in UFS with journalling enabled, so moving to 
>>>> doing the snapshots in the block layer would be a good way to make 
>>>> up for this.  Beware that while the GEOM transform would be pretty 
>>>> straight-forward to write, the real trick comes from being able to 
>>>> make the consumer of a block device (a filesystem, maybe) flush 
>>>> itself to a consistent state while the snapshot is being taken.  The 
>>>> infrastructure for this is the part that is very interesting, but 
>>>> also the most work.
>>>> Scott
>>> Scott,
>>> Have you looked at the journaling layer that Matt has been adding to 
>>> DragonflyBSD?  What you are talking about appears very similar.  Or 
>>> am I misunderstanding something?
>>> Richard Coleman
>>> rcoleman at
>> Ah, you might have misunderstood my use of the term 'VFS library'.  This
>> is distinctly different from a 'VFS layer', which is what Matt did.
>> I've looked extensively at his work, but unfortunately it doesn't solve
>> the kinds of problems that I'm looking to solve.  After discussing
>> journalling this evening with the author of BeFS and HFS+J, I'm pretty
>> happy that I'm taking the approach that I am.
> Maybe a good SoC project (but maybe too much work) would be getting the 
> clustering UFS stuff going.. :)
> Eric

THat is more along the lines of a good master's of PhD topic.


More information about the freebsd-hackers mailing list