tuning FFS for large files Re: A specific example of a disk i/o
problem
Dieter
freebsd at sopwith.solgatos.com
Sun Oct 18 04:44:44 UTC 2009
> > I found a clue! The problem occurs with my big data partitions,
> > which are newfs-ed with options intended to improve things.
> >
> > Reading a large file from the normal ad4s5b partition only delays other
> > commands slightly, as expected. Reading a large file from the tuned
> > ad4s11 partition yields the delay of minutes for other i/o.
> > ...
> > Here is the newfs command used for creating large data partitions:
> > newfs -e 57984 -b 65536 -f 8192 -g 67108864 -h 16 -i 67108864 -U -o time $partition
>
> Any block size above the default (16K) tends to thrash and fragment buffer
> cache virtual memory. This is obviously a good pessimization with lots of
> small files, and apparently, as you have found, it is a good pessimization
> with a few large files too. I think severe fragmentation can easily take
> several seconds to recover from. The worst case for causing fragmentaion
> is probably a mixture of small and large files.
Is there any way to avoid the "thrash and fragment buffercache virtual memory"
problem other than keeping the block size 16K or smaller?
> Some users fear fs consistency bugs with block sizes >= 16K, but I've never
> seen them cause any bugs ecept performance ones.
Yep, many TB of files on filesystems created with above newfs command and
no corruption/consistency problems.
> > And they have way more inodes than needed. (IIRC it doesn't actually
> > use -i 67108864)
>
> It has to have at least 1 inode per cg, and may as well have a full block
> of them, which gives a fairly large number of inodes especially if the
> block size is too large (64K), so the -i ratio is limited.
I converted a few filesystems to the default. In addition to losing space,
fsck time went through the roof. So back to playing with newfs options.
For some reason, larger block/frag sizes allow fewer cylinder groups, which
reduces the number of inodes more than the larger block size increases
it. From my reading of the newfs man page, -c only allows making
cylinder groups smaller, not larger, and that appears to be the case in
practice.
default:
newfs -U /dev/ad14s4
/dev/ad14s4: 431252.6MB (883205320 sectors) block size 16384, fragment size 2048
using 2348 cylinder groups of 183.72MB, 11758 blks, 23552 inodes.
Filesystem 1M-blocks Used Avail Capacity iused ifree %iused Mounted on
/dev/ad14s4 417678 0 384263 0% 2 55300092 0%
fsck -fp: real 0m37.165s
Attempt to reduce number of inodes:
newfs -U -i 134217728 -g 134217728 -h 16 -e 261129 /dev/ad14s4
density reduced from 134217728 to 3676160
/dev/ad14s4: 431252.6MB (883205320 sectors) block size 16384, fragment size 2048
using 1923 cylinder groups of 224.38MB, 14360 blks, 64 inodes.
Filesystem 1M-blocks Used Avail Capacity iused ifree %iused Mounted on
/dev/ad14s4 431162 0 396669 0% 2 123068 0%
fsck -fp: real 0m32.687s
Bigger block size:
newfs -U -i 134217728 -g 134217728 -h 16 -e 261129 -b 65536 /dev/ad14s4
increasing fragment size from 2048 to block size / 8 (8192)
density reduced from 134217728 to 14860288
/dev/ad14s4: 431252.6MB (883205312 sectors) block size 65536, fragment size 8192
using 119 cylinder groups of 3628.00MB, 58048 blks, 256 inodes.
Filesystem 1M-blocks Used Avail Capacity iused ifree %iused Mounted on
/dev/ad14s4 431230 0 396731 0% 2 30460 0%
fsck -fp: real 0m3.144s
Bigger block size and bigger frag size:
newfs -U -i 134217728 -g 134217728 -h 16 -e 261129 -b 65536 -f 65536 /dev/ad14s4
density reduced from 134217728 to 66846720
/dev/ad14s4: 431252.6MB (883205248 sectors) block size 65536, fragment size 65536
using 27 cylinder groups of 16320.56MB, 261129 blks, 512 inodes.
Filesystem 1M-blocks Used Avail Capacity iused ifree %iused Mounted on
/dev/ad14s4 431245 0 396745 0% 2 13820 0%
fsck -fp: real 0m0.369s
With -b 65536 -f 65536 I'm finally approaching a reasonable number of inodes
(even less would be better). The fsck time varies by a factor of over 100,
and results are roughly similar on filesystems with files in them.
More information about the freebsd-performance
mailing list