Jan Grant Jan.Grant at
Fri Oct 1 13:35:19 PDT 2004

On Fri, 1 Oct 2004, Brian McCann wrote:

> On Thu, 30 Sep 2004 13:59:05 -0700 (PDT), Richard Lynch <ceo at> wrote:
> > 
> > Brian McCann wrote:
> > >      Hi all...I'm having a conceptual problem I can't get around and
> > > was hoping someone can change my focus here.  I've been backing up
> > > roughly 6-8 million small files (roughly 2-4k each) using dump, but
> > > restores take forever due to the huge number of files and directories.
> > >  Luckily, I haven't had to restore for an emergency yet...but if I
> > > need to, I'm kinda stuck.  I've looked at distributed file systems
> > > like CODA, but the number of files I have to deal with will make it
> > > choke.  Can anyone offer any suggestions?  I've pondered running
> > > rsync, but am very worried about how long that will take...
> > 
> > Do the files change a lot, or is it more like a few files added/changed
> > every day, and the bulk don't change?
> > 
> > If it's the latter, you could maybe get best performance from something
> > like Subversion (a CVS derivative).
> > 
> > Though I suspect rsync would also do well in that case.
> > 
> > If a ton of those files are changing all the time, try doing a test on
> > creating a tarball and then backing up the tarball.  That may be a simple
> > managable solution.  There are probably other more complex solutions of
> > which I am ignorant :-)
> I have the case where a new file is created about every second or two,
> nothing gets changed, but files get deleted occasionally (it's a mail
> server).  I thought of using tar, but it would be just as slow as dump
> I would think.  I've thought of breaking it up into chunks, but that
> still doesn't solve my speed issue...i'm beginning to consider using
> dd since it reads the actual disk bits, and just hope that a)I don't
> ever need one file and b) the system I restore to has at least or more
> space then the original server.  Any other thoughts anyone?

You might want to experiment with something like rsync to maintain a 
"live" (ie, on a FS) second copy. If you do this don't be put off by the 
initial rsync time (which may well take ages - tar or dump/restore may 
be faster to get the second copy in place initially). Rsync over such a 
large filesystem may take quite a while but the best bet is to actually 
try it to see if it meets your needs.

Obviously a restore of a mail repository is a pretty awful thing to have 
to do. Amongst other things, users can find the "ressurrection" of 
deleted mails to be a real pain. You might want to see if your mail repo 
can generate some kind of replay log - if so, this might be the best 
route for minimising the amount of time needed to synchronise mailstores 
and to get the closest fidelity out of the copy.

Breaking your mailstore into separate chunks may well help. Yes, the 
total time for a dump/restore may be close to your current state of 
play, but if you can split the partitions between machines then you have 
the option to perform these in parallel.

jan grant, ILRT, University of Bristol.
Tel +44(0)117 9287088 Fax +44 (0)117 9287112
"...perl has been dead for more than 4 years." - Abigail in the Monastery

More information about the freebsd-questions mailing list