UFS Subdirectory limit.

Scott scottl at samsco.org
Sun Mar 27 10:39:01 PST 2005


Robert Watson wrote:
> On Sat, 26 Mar 2005, David Malone wrote:
> 
> 
>>>Also, the more important
>>>concern is that large directories simply don't scale in UFS.  Lookups
>>>are a linear operation, and while DIRHASH helps, it really doesn't scale
>>>well to 150k entries.
>>
>>It seems to work passably well actually, not that I've benchmarked it
>>carefully at this size. My junkmail maildir has 164953 entries at the
>>moment, and is pretty much continiously appended to without creating any
>>problems for the machine it lives on. Dirhash doesn't care if the
>>entries are subdirectories or files. 
>>
>>If the directory entries are largely static, the name cache should do
>>all the work, and it is well capable of dealing with lots of files. 
>>
>>We should definitely look at what sort of filesystem features we're
>>likely to need in the future, but I just wanted to see if we can offer
>>people a sloution that doesn't mean waiting for FreeBSD 6 or 7. 
> 
> 
> FWIW, I regularly use directories with several hundred thousand files in
> them, and it works quite well for the set of operations I perform
> (typically, I only append new entries to the directory).  This is with a
> cyrus server hosting fairly large shared folders -- in Cyrus, a
> maildir-like format is used.  For example, the lists.linux.kernel
> directory references 430,000 individual files.  Between UFS_DIRHASH and
> Cyrus's use of a cache file, opening the folder primarily consists of
> mmap'ing the cache file and then doing lookups, which occur quite quickly. 
> My workload doesn't currently require large numbers of directories
> referenced by a similar directory, but based on the results I've had with
> large numbers of files, I can say it likely would work fine subject to the
> ability for UFS to express it. 
> 
> Robert N M Watson
> 
> 

Luckily, linear reads through a directory are nearly O(1) in UFS since
ufs_lookup() caches the offset to the last entry read so that a
subsequent call doesn't have to start from the beginning.  I would
suspect that this, along with the namei cache, DIRHASH, and cyrus's
cache, all contribute together to make reading the spool directories
non-painful for you.  I would also suspect that there is little
manual sorting going on since cyrus chooses names for new entries that
are naturally sorted.  I'm still not sure I would consider these
behaviours to be representative of the normal, though.  It would be
quite interesting to profile the system while cyrus is trying to append
or delete a mail file into one of your large spool directories.  Would
an application that isn't as well-written as cyrus behave as well?  What
about an application like Squid?

Scott


More information about the freebsd-fs mailing list