Writing contigiously to UFS2?

Fri Sep 21 05:10:31 PDT 2007

Fluffles wrote:
> 
> Ivan Voras wrote:
>  > There 4 drives are used in what RAID form? If it's RAID0/stripe, you 
> can't avoid data being spread across the drives (since this is the point 
> of having RAID0).
> 
> It's an array of 8 drives in gconcat, so they are using the JBOD / 
> spanning / concatenating scheme, which does not have a RAID designation 
> but rather is a bunch of disks glued to each other. Thus, there is no 
> striping involved. Offset 0 to 500GB will 'land' on disk0 and then disk1 
> takes over, in scheme:
> 
> offset 0 
> ------------------------------------------------------------------- 
> offset 4TB
> disk0 -> disk1 -> disk2 -> disk3 -> disk4 -> disk5 -> disk6 -> disk7
> 
> (for everyone not familiar with concatenation)
> 
> 
>  > If the drives are simply concatenated, then there might be weird 
> behavior in choosing what cylinder groups to allocate for files. UFS 
> forces big files to be spread across cylinder groups so that no large 
> file fills entire cgs.
> 
> Exactly! And this is my problem. I do not like this behavior for various 
> reasons:
> - it causes lower sequential transfer speed because the disks have to 
> seek regularly
> - UFS causes 2 reads per second when writing sequentially, probably some 
> meta-data thing but i don't like it either
> - files are not written contiguously which causes fragmentation, 
> essentially UFS forces big files to become fragmented this way.
> 
> Even worse: data is being stored at weird locations, so that my energy 
> efficient NAS project becomes crippled. Even with the first 400GB of 
> data, it's storing that on the first 4 disks in my concat configuration, 
> so that when opening folders i have to wait 10 seconds before the disk 
> is spinned up. For regular operation, multiple disk have to be spinned 
> up which is not practical and unnecessary. Is there any way to force UFS 
> to write contiguously? Else i think i should try linux with some linux 
> filesystem (XFS, Reiser, JFS) in the hope they do not suffer from this 
> problem.
> 
> In the past when testing geom_raid5 I've tried to tune newfs parameters 
> so that it would write contiguously but still there were regular 2-phase 
> writes which mean data was not written contiguously. I really dislike 
> this behavior.

This notion of breaking up large blocks of data into smaller chunks is a 
fundamental of the UFS (well, FFS) filesystem, and has been around for 
ages.  I'm not saying it's the One True FS Format by any means, but many 
many other file systems use the same principals.

The largest file size per chunk in a cylinder group is calculated at 
newfs time, which determines also how many cylinder groups there should 
be.  I think the largest size I've seen was something in the 460MB-ish 
range, meaning any contiguous write above that would span more than one 
cylinder group.

The max cylinder group size also has another bad side effect - the more 
cylinder groups you have, the longer it takes a snapshot to be created.

I recommend trying msdos fs.  On recent -CURRENT, it should perform 
fairly well (akin to UFS2 I think), and if I recall correctly, has a 
more contiguous block layout.

In the end, extending UFS2 to support much larger cylinder group sizes 
would hugely beneficial.  Instead of forcing XFS, reiserfs, JFS, 
ext[23], etc, to be writable (which most of those are GPL'ed), why not 
start the (immensely huge) task of a UFS3, which has support for all the 
things we need for the next 5-10yrs?  UFS2 has served well from 
5.x->7.x, but what about the future?

Making a UFS3 takes time, and dedication from developers.

Eric