Google SoC idea
Eric Anderson
anderson at centtech.com
Wed Jun 8 11:47:32 GMT 2005
Scott Long wrote:
> Richard Coleman wrote:
>
>> Scott Long wrote:
>>
>>> /me jumps up and down and waves his hands
>>>
>>> The problem with journalling at the block layer is that you pretty
>>> much become forced to journal metadata and data, since the block
>>> layer really doesn't know the distinction, and definitely not in a
>>> filesystem-independent way (yes, UFS does evil things to the buffer
>>> cache by representing metadata with negative block numbers, but that
>>> is just UFS). Full journalling has many drawbacks from the viewpoint
>>> of speed and complexity, of course. So you really want to be able to
>>> do just metadata journalling.
>>>
>>> Another hard part of distinguishing between metadata and data is that
>>> filesystems have a habit of migrating disk blocks from holding
>>> metadata to holding data, and vice versa (think indirect pointer
>>> blocks, not inode blocks). If you are only replaying metadata, you
>>> want to make sure that you don't smash data blocks with old metadata.
>>>
>>> Coming up with a filesystem independent way to represent all of this
>>> for the block layer is not easy. Filesystems would have to be able
>>> to be modified to provide proper metadata vs. data hints to the block
>>> layer. And if you're going to do that, then why not just make it a
>>> library in VFS, like what Darwin does?
>>>
>>> The UFS Journalling work is already well underway, and I expect it to
>>> follow the path of being a VFS library. Note that I'm saying
>>> 'library' here, not 'layer'. There really is no way to make
>>> journalling work with an arbitrary filesystem 'for free', whether as
>>> a VFS layer or a GEOM transform, since journalling is 100% dependent
>>> on the filesystem working with the buffer-cache to do sane operations
>>> in a defined in order.
>>>
>>> An alternate SoC project that would be very useful is block-level
>>> snapshots. I'm not sure if I'll be able to retain the filesystem
>>> snapshot functionality in UFS with journalling enabled, so moving to
>>> doing the snapshots in the block layer would be a good way to make up
>>> for this. Beware that while the GEOM transform would be pretty
>>> straight-forward to write, the real trick comes from being able to
>>> make the consumer of a block device (a filesystem, maybe) flush
>>> itself to a consistent state while the snapshot is being taken. The
>>> infrastructure for this is the part that is very interesting, but
>>> also the most work.
>>>
>>> Scott
>>
>>
>>
>> Scott,
>>
>> Have you looked at the journaling layer that Matt has been adding to
>> DragonflyBSD? What you are talking about appears very similar. Or am
>> I misunderstanding something?
>>
>> Richard Coleman
>> rcoleman at criticalmagic.com
>
>
> Ah, you might have misunderstood my use of the term 'VFS library'. This
> is distinctly different from a 'VFS layer', which is what Matt did.
> I've looked extensively at his work, but unfortunately it doesn't solve
> the kinds of problems that I'm looking to solve. After discussing
> journalling this evening with the author of BeFS and HFS+J, I'm pretty
> happy that I'm taking the approach that I am.
Maybe a good SoC project (but maybe too much work) would be getting the
clustering UFS stuff going.. :)
Eric
--
------------------------------------------------------------------------
Eric Anderson Sr. Systems Administrator Centaur Technology
A lost ounce of gold may be found, a lost moment of time never.
------------------------------------------------------------------------
More information about the freebsd-hackers
mailing list