Slow disk access while rsync - what should I tune?

Sat Oct 30 22:48:49 UTC 2010

:Thank you all for the answers.
:
:..
:A lot of impact also produced by rm -rf of old backups. I assume that
:low performance is also related to a large numbers of hardlinks. There
:was a moment when I had ~15 backups hardlinked by rsync, and rm -rf of

    Yes, hardlinked backups pretty much destroy performance, mainly
    because it destroys all locality of reference on the storage media
    when files are slowly modified and get their own copies, mixed with
    other 'old' files which have not been modified.  But theoretically
    that should only effect the backup target storage and not the server's
    production storage.

    Here is what I would suggest:  Move the backups off the production
    machine and onto another totally separate machine, then rsync between
    the two machines.  That will solve most of your problems I think.
    If the backup disk is a single drive then just use a junk box lying
    around somewhere for your backup system with the disk installed in it.

    --

    The other half of the problem is the stat()ing of every single file
    on the production server (whether via local rsync or remote rsync).
    If your original statement is accurate and you have in excess of
    11 million files then the stat()ing will likely force the system vnode
    cache on the production system to cycle, whether it has a max of
    100,000 or 500,000... doesn't matter, it isn't 11 million so it will
    cycle.  This in turn will tend to cause the buffer and VM page caches
    (which are linked to the vnode cache) to get blown away as well.

    The vnode cache should have code to detect stat() style accesses and
    avoid blowing away unrelated cached vnodes which have cached data
    associated with them, but it's kinda hit-or-miss how well that works.
    It is very hard to tune those sorts of algorithms and when one is
    talking about a inode:cache ratio of 22:1 even a good algorithm will
    tend to break down.

    Generally speaking when caches become inefficient server throughput
    goes to hell.  You go from e.g. 10uS to access a file to 6mS to
    access a file, a 1:600 loss.

:May be it is possible to increase disk performance somehow? Server has
:a lot of memory. At this time vfs.ufs.dirhash_maxmem = 67108864 (max
:monitored value for vfs.ufs.dirhash_mem was 52290119) and
:kern.maxvnodes = 500000 (max monitored value for vfs.numvnodes was
:450567). Can increasing of these (or other) sysctls help? I ask
:because (as you can see) these tunables are already incremented, and I
:am not sure further increment really makes sense.

    I'm not sure how this can be best dealt with in FreeBSD.  If you are
    using ZFS it should be possible to localize or cache the meta-data
    associated with those 11 million+ files in some very fast storage
    (i.e. like a SSD).  Doing so will make the stat() portion of the rsync
    go very fast (getting it over with as quickly as possible).

    With UFS the dirhash stuff only caches the directory entries, not the
    inode contents (though I'm not 100% positive on that), so it won't help
    much.  The directory entries are already linear and unless you have
    thousands of files in each directory ufs dirhash will not save much
    in the way of I/O.

:Also, is it possible to limit disk operations for rm -rf somehow? The
:only idea I have at the moment is to replace rm -rf with 'find |
:slow_down_script | xargs rm' (or use similar patch as for rsync)...

    No, unfortunately there isn't much you can do about this due to
    the fact that the files are hardlinked, other than moving the backup
    storage entirely off the production server or otherwise determining
    why disk I/O to the backup storage is effecting your primary storage
    and hacking a fix.

    The effect could be indirect... the accesses to the backup
    storage are blowing away the system caches and causing the 
    production storage to get overloaded with I/O.  I don't think
    there is an easy solution other than to move the work off
    the production server entirely.

:And also, maybe there are other ways to create incremental backups
:instead of using rsync/hardlinks? I was  thinking  about generating
:list of changed files with own script and packing it with tar, but I
:did not find a way to remove old backups with such an easy way as it
:is with hardlnks..
:
:Thanks in advance!
:...
:-- 
:// cronfy

    Yes.  Use snapshots.  ZFS is probably your best bet here in FreeBSDland
    as ZFS not only has snapshots it also has a streaming backup feature
    that you can use to stream changes from one ZFS filesystem (i.e. on
    your production system) to another (i.e. on your backup system).
    Both the production system AND the backup system would have to be
    running ZFS to make proper use of the feature.

    But before you start worrying about all of that I suggest taking the
    first step, which is to move the backups entirely off the production
    system.  There are many ways to handle LAN backups.  My personal
    favorite (which doesn't help w/ the stat problem but which is easy 
    to set up) is for the backup system to NFS mount the production system
    and periodically 'cpdup' the production system's filesystems over to
    the backup system.  Then create a snapshot (don't use hardlinks),
    and repeat.  As a fringe benefit the backup system does not have to
    rely on backup management scripts running on the production system...
    i.e. the production system can be oblivious to the mechanics of the
    backup.  And with NFS's (NFSv3 here) rdirplus scanning the production
    filesystem via NFS should go pretty quickly.

    It is possible for files to be caught mid-change but also fairly
    easy to detect the case if it winds up being a problem.  And, of
    course, more sophisticated methodologies can be built on top.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>