ZFS and large directories - caveat report

Thu Jul 21 17:08:04 UTC 2011

On Jul 21, 2011, at 12:45 PM, Ivan Voras wrote:

> I'm writing this mostly for future reference / archiving and also if someone has an idea on how to improve the situation.
> 
> A web server I maintain was hit by DoS, which has caused more than 4 million PHP session files to be created. The session files are sharded in 32 directories in a single level - which is normally more than enough for this web server as the number of users is only a couple of thousand. With the DoS, the number of files per shard directory rose to about 130,000.
> 
> The problem is: ZFS has proven horribly inefficient with such large directories. I have other, more loaded servers with simlarly bad / large directories on UFS where the problem is not nearly as serious as here (probably due to the large dirhash). On this system, any operation which touches even only the parent of these 32 shards (e.g. "ls") takes seconds, and a simple "find | wc -l" on one of the shards takes > 30 minutes (I stopped it after 30 minutes). Another symptom is that SIGINT-ing such find process takes 10-15 seconds to complete (sic! this likely means the kernel operation cannot be interrupted for so long).
> 
> This wouldn't be a problem by itself, but operations on such directories eat IOPS - clearly visible with the "find" test case, making the rest of the services on the server fall as collateral damage. Apparently there is a huge amount of seeking being done, even though I would think that for read operations all the data would be cached - and somehow the seeking from this operation takes priority / livelocks other operations on the same ZFS pool.
> 
> This is on a fresh 8-STABLE AMD64, pool version 28 and zfs version 5.
> 
> Is there an equivalent of UFS dirhash memory setting for ZFS? (i.e. the size of the metadata cache)

Hello Ivan,

I've some kind of similar problems on a client that needs to store a large amount of files.

I have 4.194.303 (0x3fffff) files created on FS (unused files are already created with zero size - this was a precaution from the UFS times to avoid the 'no more free inodes on FS').

And I just break the files like mybasedir/3f/ff/ff, so under no circumstance i have a 'big amount of files' in a single directory.

The general usage on this server is fine, but the periodic (daily) scripts take almost a day to complete and the server is slow as hell while the daily scripts are running.

All i need to do is kill 'find' to get the machine back to 'normal'.

I did not stopped to look at it in detail, but the little bit i checked, looks like the stat() calls takes a long time on ZFS files.

Previously, we'd this running on UFS with a database of 16.777.215 (0xffffff) files without any kind of trouble (i've reduced the database size to keep the daily scripts run time under control).

The periodic script is simply doing its job of verifying setuid files (and comparing the list with the previous one).

So, yes, i can confirm that running 'find' on a ZFS FS with a lot of files is very, very slow (and looks like it isn't related to how the files are distributed on the FS).

But sorry, no idea about how to improve that situation (yet).

Regards,
Luiz