copying milllions of small files and millions of dirs

Charles Swiger cswiger at mac.com
Thu Aug 15 20:23:04 UTC 2013


[ ...combining replies for brevity... ]

On Aug 15, 2013, at 1:02 PM, Frank Leonhardt <frank2 at fjl.co.uk> wrote:
> I'm reading all this with interest. The first thing I'd have tried would be tar (and probably netcat) but I'm a probably bit of a dinosaur. (If someone wants to buy me some really big drives I promise I'll update). If it's really NFS or nothing I guess you couldn't open a socket anyway.

Either tar via netcat or SSH, or dump / restore via similar pipeline are quite traditional.  tar is more flexible for partial filesystem copies, whereas the dump / restore is more oriented towards complete filesystem copies.  If the destination starts off empty, they're probably faster than rsync, but rsync does delta updates which is a huge win if you're going to be copying changes onto a slightly older version.

Anyway, you're entirely right that the capabilities of the source matter a great deal.
If it could do zfs send / receive, or similar snapshot mirroring, that would likely do better than userland tools.

> I'd be interested to know whether tar is still worth using in this world of volume managers and SMP.

Yes.

On Aug 15, 2013, at 12:14 PM, aurfalien <aurfalien at gmail.com> wrote:
[ ... ]
>>>>> Doin 10Gb/jumbos but in this case it don't make much of a hoot of a diff.
>>>> 
>>>> Yeah, probably not-- you're almost certainly I/O bound, not network bound.
>>> 
>>> Actually it was network bound via 1 rsync process which is why I broke up 154 dirs into 7 batches of 22 each.
>> 
>> Oh.  Um, unless you can make more network bandwidth available, you've saturated the bottleneck.
>> Doing a single copy task is likely to complete faster than splitting up the job into subtasks in such a case.
> 
> Well, using iftop, I am now at least able to get ~1Gb with 7 scripts going were before it was in the 10Ms with 1.

1 gigabyte of data per second is pretty decent for a 10Gb link; 10 MB/s obviously wasn't close saturating a 10Gb link.

Regards,
-- 
-Chuck



More information about the freebsd-questions mailing list