Feedback on UFS2 tuning for large number of small files (~100m)
Ciprian Dorin Craciun
ciprian.craciun at gmail.com
Wed Jun 8 08:15:22 UTC 2016
Hello all! (Please keep me in CC as I'm not subscribed on the mailing
list. Should I perhaps post this to the `freebsd-fs` mailing list?)
I would like your feedback on tuning a UFS2 file-system for the
following use-case, which is very similar to a maildir mail server. I
tried to look for hints on the internet, but found nothing more
in-depth than enabling soft-updates, `noatime`, etc.
The main usage of the file-system is:
* there are 4 separate files stores, each with about 50 million files,
all on the same partition;
* all of the 4 file stores have a dispersed layout on two levels (i.e.
`XX/YY/ZZ...`, where `ZZ...` is a 64 hexadecimal string); (as a
consequence there shouldn't be more than one thousand files per leaf
folder;)
* all of the files above are around 2-3 KiB;
* these files are read-mostly, and they are never deleted;
* there is almost no access contention, neither read or write;
* there are 4 matching "queue" stores, dispersed on a single level,
containing symlinks;
* each symlink points to a path roughly 100-200 characters in length;
* I wouldn't expect more than a few thousand files for each store;
* the symlinks are constantly `rename`-d-in and `rename`-d-out
in-and-out of these folders;
* these folders are constantly listed, by 4-32 parallel processes (not
multi-threaded);
* (basically I use stores to emulate a queuing system, and I'm careful
that each process tries randomly the leaf folders, thus reducing
contention; and also pausing if the queue "seems" empty;)
As sidenotes:
* the partition is backed by two mirrored disks (which I'm assuming
are rotating SCSI disks);
* persistence in case of power or system failure (i.e. files getting
truncated or missing) is not so critical for my use-case;
* however file-system consistency on failure (i.e. getting a correct
mounted file-system) is important, thus from what I've read from the
`mount` man-page, `async` is not an option;
* the system has plenty of RAM (32 GiB), however it is constantly
under 100% CPU load by processes on nice level 10;
* this system is dedicated to the task at hand, therefore there is no
other background contention;
The problem that prompted me to ask the community for feedback is that
under load (i.e. 100% CPU usage by processes on nice level 10), even
listing the file-system seems to stall, ranging from a fraction of
second up to a few seconds.
The output of `iostat -w 30 -d -C -x -I` under load is (the values are
cumulated per 30 seconds, thus not average per second):
~~~~
device r/i w/i kr/i kw/i qlen
tsvc_t/i sb/i us ni sy in id
ada0 1243893.0 4988740.0 6447101.5 311428382.5 600
812579.1 8698.9 0 0 0 0 100
ada1 1243889.0 4988824.0 6429851.0 311428550.5 520
766389.6 8437.3
device r/i w/i kr/i kw/i qlen
tsvc_t/i sb/i us ni sy in id
ada0 582.0 12510.0 2328.0 152986.5 383
9463.4 28.9 0 3 1 0 96
ada1 587.0 12465.0 2348.0 152806.5 343
9107.8 28.7
device r/i w/i kr/i kw/i qlen
tsvc_t/i sb/i us ni sy in id
ada0 792.0 12933.0 3168.0 157643.5 542
11178.8 29.1 0 3 1 0 96
ada1 791.0 12893.0 3164.0 157651.5 544
10591.2 28.5
~~~~
The file-system is mounted with the following options:
~~~~
ufs rw,noatime
~~~~
The `dumpefs` of the file-system outputs the following:
~~~~
magic 19540119 (UFS2) time Sat Jun 4 05:59:23 2016
superblock location 65536 id [ 56cb7a3f 33fd7a56 ]
ncg 2897 size 464257019 blocks 449679279
bsize 32768 shift 15 mask 0xffff8000
fsize 4096 shift 12 mask 0xfffff000
frag 8 shift 3 fsbtodb 3
minfree 8% optim time symlinklen 120
maxbsize 32768 maxbpg 4096 maxcontig 4 contigsumsize 4
nbfree 56167793 ndir 265137 nifree 232205846 nffree 9111
bpg 20035 fpg 160280 ipg 80256 unrefs 0
nindir 4096 inopb 128 maxfilesize 2252349704110079
sbsize 4096 cgsize 32768 csaddr 5056 cssize 49152
sblkno 24 cblkno 32 iblkno 40 dblkno 5056
cgrotor 0 fmod 0 ronly 0 clean 0
metaspace 6408 avgfpdir 64 avgfilesize 16384
flags soft-updates+journal
fsmnt /some-path
volname swuid 0 providersize 464257019
~~~~
Thus I would like to ask the community what I can tune (even by
re-formatting) to make it more "responsive", and alternatively I am
open to another file-system type, perhaps more suited for this
use-case.
Thanks,
Ciprian.
More information about the freebsd-questions
mailing list