copying milllions of small files and millions of dirs

Thu Aug 15 18:37:07 UTC 2013

On Aug 15, 2013, at 11:26 AM, Charles Swiger wrote:

> On Aug 15, 2013, at 11:13 AM, aurfalien <aurfalien at gmail.com> wrote:
>> Is there a faster way to copy files over NFS?
> 
> Probably.

Ok, thanks for the specifics.

>> Currently breaking up a simple rsync over 7 or so scripts which copies 22 dirs having ~500,000 dirs or files each.
> 
> There's a maximum useful concurrency which depends on how many disk spindles and what flavor of RAID is in use; exceeding it will result in thrashing the disks and heavily reducing throughput due to competing I/O requests.  Try measuring aggregate performance when running fewer rsyncs at once and see whether it improves.

Its 35 disks broken into 7 striped RaidZ groups with an SLC based ZIL and no atime, the server it self has 128GB ECC RAM.  I didn't have time to tune or really learn ZFS but at this point its only backing up the data for emergency purposes.

> Of course, putting half a million files into a single directory level is also a bad idea, even with dirhash support.  You'd do better to break them up into subdirs containing fewer than ~10K files apiece.

I can't, thats our job structure obviously developed by scrip kiddies and not systems ppl, but I digress.

>> Obviously reading all the meta data is a PITA.
> 
> Yes.
> 
>> Doin 10Gb/jumbos but in this case it don't make much of a hoot of a diff.
> 
> Yeah, probably not-- you're almost certainly I/O bound, not network bound.

Actually it was network bound via 1 rsync process which is why I broke up 154 dirs into 7 batches of 22 each.

I'll have to acquaint myself with ZFS centric tools to help me determine whats going on.

But