Writing contigiously to UFS2?

Bruce Evans brde at optusnet.com.au
Wed Sep 26 00:59:35 PDT 2007


On Tue, 25 Sep 2007, Rick C. Petty wrote:

> On Fri, Sep 21, 2007 at 02:45:35PM +0200, Ivan Voras wrote:
>> Stefan Esser wrote:
>>
>> From experience (not from reading code or the docs) I conclude that
>> cylinder groups cannot be larger than around 190 MB. I know this from
>> numerous runnings of newfs and during development of gvirstor which
>> interacts with cg in an "interesting" way.
>
> Then you didn't run newfs enough:
>
> # newfs -N -i 12884901888 /dev/gvinum/mm-flac
> density reduced from 2147483647 to 3680255
> /mm/flac: 196608.0MB (402653184 sectors) block size 16384, fragment size 2048
>        using 876 cylinder groups of 224.50MB, 14368 blks, 64 inodes.

That's insignificantly more.  Even doubling the size wouldn't make much
difference.  I see differences of at most 25% going the other way and
halving the block size twice, which halves the cg size 4 times: on ffs1:

     4K blocks, 512-frags -e 512  (broken default):     40MB/S
     4K blocks, 512-frags -e 1024 (broken default):     44MB/S
     4K blocks, 512-frags -e 2048 (best), kernel fixes: 47MB/S
     4K blocks, 512-frags -e 8192 (try too hard), kernel fixes
        (kernel fixes are not complete enough to handle this case;
        defaults and -e values which are < the cg size work best except
        possibly when the fixes are complete):          45MB/S
     16K blocks, 2K-frags -e 2K   (broken default):     50MB/S
     16K blocks, 2K-frags -e 4K   (fixed default):      50.5MB/S
     16K blocks, 2K-frags -e 8K   (best):               51.5MB/S
     16K blocks, 2K-frags -e 64K  (try too hard):       < 51MB/S again

     Getting a 3% iimprovement just be avoiding a seek or 2 every cg is
     very surprising for 16K-blocks with 2K frags.  There has to be a
     seek for every cg, and bugs give 2 seeks.  However, with -e 2K, that
     is only 2 extra seeks every 2048 blocks, where the block size is
     large, so I would have expected an improvement of at most 2 in 2048.
     The access pattern is probably confusing the drive's cache (it's an
     old ATA drive with only 2MB cache).

> If you wish to play around with the block/frag sizes, you can greatly
> increase the CG size:
>
> # newfs -N -f 8192 -b 65536 -i 12884901888 /dev/gvinum/mm-flac
> density reduced from 2147483647 to 14868479
> /mm/flac: 196608.0MB (402653184 sectors) block size 65536, fragment size 8192
>        using 55 cylinder groups of 3628.00MB, 58048 blks, 256 inodes.
>
> Doing this is quite appropriate for large disks.  This last command means:
> blocks are allocated in 64k chunks and the minimum allocation size is 8k.
> Some may say this is wasteful, but one could also argue that using less
> than 10% of your inodes is also wasteful.

Both are wasteful.  The kernel buffer cache is tuned for 16K-blocks.
64K-blocks cause either resource contention (if you don't tune BKVASIZE)
or bogusly reduced resources (if you do tune it without fixing other
really arcane parameters (wrong magic numbers in source code...)).
There is lots of FUD about block sizes larger than 16K causing bugs,
but I haven't seen any problems from them except slowness.  64K-blocks
also cause slowness in general because they are just too big, but this
shouldn't be a problem if most files are large.

> Here might be an interesting experiment to try.  Write a new version of
> /usr/src/sbin/newfs/mkfs.c that doesn't have the restriction that the free
> fragment bitmap resides in one block.  I'm not 100% sure if the FFS code
> would handle it properly, but in theory it should work (the offsets are
> stored in the superblocks).  This is the biggest restriction on the CG
> size.  You should be able to create 2-4 CGs to span each of your 1TB
> drives without increasing the block size and thus minimum allocation unit.

In theory it won't work.  From fs.h:

%%%
/*
  * The size of a cylinder group is calculated by CGSIZE. The maximum size
  * is limited by the fact that cylinder groups are at most one block.
  * Its size is derived from the size of the maps maintained in the
  * cylinder group and the (struct cg) size.
  */
%%%

Only offsets to the inode blocks, etc. are stored in the superblock.

Bruce


More information about the freebsd-fs mailing list