Slow disk access while rsync - what should I tune?
Matthew Dillon
dillon at apollo.backplane.com
Sat Oct 30 22:48:49 UTC 2010
:Thank you all for the answers.
:
:..
:A lot of impact also produced by rm -rf of old backups. I assume that
:low performance is also related to a large numbers of hardlinks. There
:was a moment when I had ~15 backups hardlinked by rsync, and rm -rf of
Yes, hardlinked backups pretty much destroy performance, mainly
because it destroys all locality of reference on the storage media
when files are slowly modified and get their own copies, mixed with
other 'old' files which have not been modified. But theoretically
that should only effect the backup target storage and not the server's
production storage.
Here is what I would suggest: Move the backups off the production
machine and onto another totally separate machine, then rsync between
the two machines. That will solve most of your problems I think.
If the backup disk is a single drive then just use a junk box lying
around somewhere for your backup system with the disk installed in it.
--
The other half of the problem is the stat()ing of every single file
on the production server (whether via local rsync or remote rsync).
If your original statement is accurate and you have in excess of
11 million files then the stat()ing will likely force the system vnode
cache on the production system to cycle, whether it has a max of
100,000 or 500,000... doesn't matter, it isn't 11 million so it will
cycle. This in turn will tend to cause the buffer and VM page caches
(which are linked to the vnode cache) to get blown away as well.
The vnode cache should have code to detect stat() style accesses and
avoid blowing away unrelated cached vnodes which have cached data
associated with them, but it's kinda hit-or-miss how well that works.
It is very hard to tune those sorts of algorithms and when one is
talking about a inode:cache ratio of 22:1 even a good algorithm will
tend to break down.
Generally speaking when caches become inefficient server throughput
goes to hell. You go from e.g. 10uS to access a file to 6mS to
access a file, a 1:600 loss.
:May be it is possible to increase disk performance somehow? Server has
:a lot of memory. At this time vfs.ufs.dirhash_maxmem = 67108864 (max
:monitored value for vfs.ufs.dirhash_mem was 52290119) and
:kern.maxvnodes = 500000 (max monitored value for vfs.numvnodes was
:450567). Can increasing of these (or other) sysctls help? I ask
:because (as you can see) these tunables are already incremented, and I
:am not sure further increment really makes sense.
I'm not sure how this can be best dealt with in FreeBSD. If you are
using ZFS it should be possible to localize or cache the meta-data
associated with those 11 million+ files in some very fast storage
(i.e. like a SSD). Doing so will make the stat() portion of the rsync
go very fast (getting it over with as quickly as possible).
With UFS the dirhash stuff only caches the directory entries, not the
inode contents (though I'm not 100% positive on that), so it won't help
much. The directory entries are already linear and unless you have
thousands of files in each directory ufs dirhash will not save much
in the way of I/O.
:Also, is it possible to limit disk operations for rm -rf somehow? The
:only idea I have at the moment is to replace rm -rf with 'find |
:slow_down_script | xargs rm' (or use similar patch as for rsync)...
No, unfortunately there isn't much you can do about this due to
the fact that the files are hardlinked, other than moving the backup
storage entirely off the production server or otherwise determining
why disk I/O to the backup storage is effecting your primary storage
and hacking a fix.
The effect could be indirect... the accesses to the backup
storage are blowing away the system caches and causing the
production storage to get overloaded with I/O. I don't think
there is an easy solution other than to move the work off
the production server entirely.
:And also, maybe there are other ways to create incremental backups
:instead of using rsync/hardlinks? I was thinking about generating
:list of changed files with own script and packing it with tar, but I
:did not find a way to remove old backups with such an easy way as it
:is with hardlnks..
:
:Thanks in advance!
:...
:--
:// cronfy
Yes. Use snapshots. ZFS is probably your best bet here in FreeBSDland
as ZFS not only has snapshots it also has a streaming backup feature
that you can use to stream changes from one ZFS filesystem (i.e. on
your production system) to another (i.e. on your backup system).
Both the production system AND the backup system would have to be
running ZFS to make proper use of the feature.
But before you start worrying about all of that I suggest taking the
first step, which is to move the backups entirely off the production
system. There are many ways to handle LAN backups. My personal
favorite (which doesn't help w/ the stat problem but which is easy
to set up) is for the backup system to NFS mount the production system
and periodically 'cpdup' the production system's filesystems over to
the backup system. Then create a snapshot (don't use hardlinks),
and repeat. As a fringe benefit the backup system does not have to
rely on backup management scripts running on the production system...
i.e. the production system can be oblivious to the mechanics of the
backup. And with NFS's (NFSv3 here) rdirplus scanning the production
filesystem via NFS should go pretty quickly.
It is possible for files to be caught mid-change but also fairly
easy to detect the case if it winds up being a problem. And, of
course, more sophisticated methodologies can be built on top.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the freebsd-hackers
mailing list