Large Filesystem Woes
Peter Jeremy
peterjeremy at optushome.com.au
Sat Jan 10 14:55:25 PST 2004
On Fri, Jan 09, 2004 at 11:35:51AM -0800, Tom Arnold wrote:
>Building a box thats going to house many billions of small files. Think
>innd circa 1998 or someone trying to house AOLs mail system on cyrus or
>something.
This is probably going to stress any filesystem. You might like to
consider an alternative approach to storing the files (eg some sort of
database).
> To this end I've hung a 3.3TB hardware raid off a BSD box
>broken into 4 partitions. 3 1TB and 1 300GB.
>Originally this was on a 4.9 box. da0s1 and da0s2 were formatted "stock"
>( -f 2048 -b 16384 -i 8192 ) da1s1 and s2 were both formatted -f 512 -b 4096
>-i 512.
I ran '-f 512 -b 4096' on a news server for a while but I found that
'-f 1024 -b 8192' significantly improved performance (at the cost of
a significant increase in disk space usage).
>Switched to 5.2. Newfs'd the RAID for UFS2. First issue, if the machine
>came up dirty, bgfsck seemed to do its thing and the machine was online and
>usable after about 20 minutes however after a few hours I get this error :
>
>fsck: /dev/da1s1e: CANNOT CREATE SNAPSHOT /export/database/.snap/fsck_snapshot: File too large
>fsck: /dev/da1s1e: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
I can't explain this. This means that mount(2) returned EFBIG - which
isn't a documented error. I had a quick look through the sources and
can't quickly see why EFBIG would get returned.
>And the second thing I've noticed is I have lost a lot of space.
>Under 4.9 with UFS da1s1e was approx 870gigs and s2e was around 180, now
>I see :
>Filesystem Size Used Avail Capacity iused ifree %iused Mounted on
>/dev/da0s1e 992G 4.0K 912G 0% 2 134411260 0% /export/logs1
>/dev/da0s2e 992G 4.0K 912G 0% 2 134411260 0% /export/logs2
>/dev/da1s1e 510G 1.0K 469G 0% 2 2148661228 0% /export/database
>/dev/da1s2e 94G 1.0K 86G 0% 2 395214332 0% /export/spare
The size of a UFS1 inode is 128 bytes and a UFS2 inode is 256 bytes.
With '-i 512', UFS2 allocates about 1/2 of your disk space to inodes.
(And you have a further overhead of 8 bytes + name for each directory
entry).
>I'm not certain if I've run into some kind of weird limit here or a bug or
>what and am looking for ideas to persue before I'm stuck going to an OS with
>something journaled.
Inode numbers are supposed to be u_int32_t but it's possible that they
are being (incorrectly) treated as signed somewhere (and you have >2^31
inodes on da1s1e).
Moving to a journalled filesystem won't necessarily help. I use
DEC/Compaq/HP AdvFS at work - each file needs at least 282 bytes of
metadata (under some circumstances, it can require multiple 282 byte
metadata blocks) and from memory it is limited to 2^31 (or maybe 2^32)
files. Our main fileserver has a filesystem with 2.7e6 files and we
are continually running into undocumented "features" (aka bugs) as a
result of the large number of files. (OTOH, I have no problems with
1.9e6 files in a UFS1 partition on a FreeBSD box).
Peter
More information about the freebsd-hackers
mailing list