UFS2 snapshots on large filesystems

Sun Nov 13 09:17:37 PST 2005

Xin LI wrote:
> On 11/5/05, Scott Long <scottl at samsco.org> wrote:
> 
>>The UFS snapshot code was written at a time when disks were typically
>>around 4-9GB in size, not 400GB in size =-)  Unfortunately, the amount
> 
> 
> s/size/cylinder groups/g :-)
> 
> 
>>of time it takes to do the initial snapshot bookkeeping scales linearly
>>with the size of the drive, and many people have reported that it takes
>>considerable amount of time (anywhere from several minutes to several
>>dozen minutes) on large drives/arrays like you describe.  So, you should
>>test and plan accordingly if you are interested in using them.
> 
> 
> I have some ideas about lazy snapshotting.  But unfortunately I don't
> have much time to implement a prototype ATM, and I think we really
> need a file system that is capable for:
>  - Handling large number of files in one directory (say, some sort of
> indexing mechanism, etc.  And yes, I know that this is somewhat
> insane, but the [ab]use is present in many large e-mail systems that
> uses mailbox)
>  - Effective recovery.  Personally I do not buy journalling much, and
> I think the problem could be resolved by something like WAFL did.
> 
> I think that JUFS would provide some help for (2), do you have some
> plan about (1)?
> 

I guess that UFS_DIRHASH doesn't give enough benefit for your situation?
The idea of doing alternate directory layouts (such as b-trees) has been
proposed a number of times.  Apparently there was an idea at one point
for UFS to generate a b-tree layout for directory and and save it on
disk as a cache.  The primary method of directory storage would remain
the traditional linear way so that compatibility is preserved, but OS's
that were aware of the cache could use it too.  There are still some
reserved flags and fields in UFS2 for doing this, in case you're
interested.  Since it requires double bookkeeping for link creation and
removal, I'm not sure how speedy it is for anything other than
VOP_LOOKUP operations.  An alternate idea I've had is to break with
compatibility and doing b-trees or something similar as the native
format for UFS3 (along with native journalling and other things).

Scott