UFS Subdirectory limit.

Sun Mar 27 12:45:11 PST 2005

> Luckily, linear reads through a directory are nearly O(1) in UFS since
> ufs_lookup() caches the offset to the last entry read so that a
> subsequent call doesn't have to start from the beginning.

(dirhash also has an equivelent optimisation 'cos that bit of
ufs_lookup code isn't called when dirhash is in use)

> Would
> an application that isn't as well-written as cyrus behave as well?  What
> about an application like Squid?

Random lookups should be almost O(1) with dirhash when you have
many operations to amortise the cost of the hash over. You loose
out with dirhash are when you make a small number of accesses to a
large directory and all those entries live close to the beginning
of the directory (or possibly when you're thrashing against dirhash's
memory limit).

If the directory entries are actually constant (as is the case with
squid in truncate mode), then you should get ~O(1) but with a
slightly smaller constant than when the directory entries are
changing.

Just to check, I'm running a benchmark at the moment to compare
150k directories either aranged as:

	1) a flat 150k subdirectories of one directory, or
	2) 150k directories arranged as a two levels with ~387
	   subdirectories.

At the moment it looks like accessing files in either structure
performs equivelently but it is a bit slower to build/remove the
flat structure. I'll post the results once the run is complete.

	David.