copying milllions of small files and millions of dirs

iamatt iamatt at gmail.com
Thu Aug 15 21:21:55 UTC 2013


I would use ndmp.  That is how we  archive our  nas crap  isilon stuff but
we have the backend accelerators   Not sure if there is ndmp for FreeBSD.
Like another poster said   you are most likely i/o bound anyway.


On Thu, Aug 15, 2013 at 2:14 PM, aurfalien <aurfalien at gmail.com> wrote:

>
> On Aug 15, 2013, at 11:52 AM, Charles Swiger wrote:
>
> > On Aug 15, 2013, at 11:37 AM, aurfalien <aurfalien at gmail.com> wrote:
> >> On Aug 15, 2013, at 11:26 AM, Charles Swiger wrote:
> >>> On Aug 15, 2013, at 11:13 AM, aurfalien <aurfalien at gmail.com> wrote:
> >>>> Is there a faster way to copy files over NFS?
> >>>
> >>> Probably.
> >>
> >> Ok, thanks for the specifics.
> >
> > You're most welcome.
> >
> >>>> Currently breaking up a simple rsync over 7 or so scripts which
> copies 22 dirs having ~500,000 dirs or files each.
> >>>
> >>> There's a maximum useful concurrency which depends on how many disk
> spindles and what flavor of RAID is in use; exceeding it will result in
> thrashing the disks and heavily reducing throughput due to competing I/O
> requests.  Try measuring aggregate performance when running fewer rsyncs at
> once and see whether it improves.
> >>
> >> Its 35 disks broken into 7 striped RaidZ groups with an SLC based ZIL
> and no atime, the server it self has 128GB ECC RAM.  I didn't have time to
> tune or really learn ZFS but at this point its only backing up the data for
> emergency purposes.
> >
> > OK.  If you've got 7 independent groups and can use separate network
> pipes for each parallel copy, then using 7 simultaneous scripts is likely
> reasonable.
> >
> >>> Of course, putting half a million files into a single directory level
> is also a bad idea, even with dirhash support.  You'd do better to break
> them up into subdirs containing fewer than ~10K files apiece.
> >>
> >> I can't, thats our job structure obviously developed by scrip kiddies
> and not systems ppl, but I digress.
> >
> > Identifying something which is "broken as designed" is still helpful,
> since it indicates what needs to change.
> >
> >>>> Obviously reading all the meta data is a PITA.
> >>>
> >>> Yes.
> >>>
> >>>> Doin 10Gb/jumbos but in this case it don't make much of a hoot of a
> diff.
> >>>
> >>> Yeah, probably not-- you're almost certainly I/O bound, not network
> bound.
> >>
> >> Actually it was network bound via 1 rsync process which is why I broke
> up 154 dirs into 7 batches of 22 each.
> >
> > Oh.  Um, unless you can make more network bandwidth available, you've
> saturated the bottleneck.
> > Doing a single copy task is likely to complete faster than splitting up
> the job into subtasks in such a case.
>
> Well, using iftop, I am now at least able to get ~1Gb with 7 scripts going
> were before it was in the 10Ms with 1.
>
> Also, physically looking at my ZFS server, it now shows the drives lights
> are blinking faster, like every second.  Were as before it was sort of
> seldom, like every 3 seconds or so.
>
> I was thinking to perhaps zip dirs up and then xfer the file over but it
> would prolly take as long to zip/unzip.
>
> This bloody project structure we have is nuts.
>
> - aurf
> _______________________________________________
> freebsd-questions at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "
> freebsd-questions-unsubscribe at freebsd.org"
>


More information about the freebsd-questions mailing list