bin/53475: cp(1) copies files in reverse order to destination
Bruce Evans
bde at zeta.org.au
Wed Apr 28 00:40:17 PDT 2004
The following reply was made to PR bin/53475; it has been noted by GNATS.
From: Bruce Evans <bde at zeta.org.au>
To: "Dorr H. Clark" <dclark at applmath.scu.edu>
Cc: freebsd-gnats-submit at freebsd.org
Subject: Re: bin/53475: cp(1) copies files in reverse order to destination
Date: Wed, 28 Apr 2004 17:37:25 +1000 (EST)
On Tue, 27 Apr 2004, Dorr H. Clark wrote:
> ...
> -/*
> - * mastercmp --
> - * The comparison function for the copy order. The order is to
> copy
> - * non-directory files before directory files. The reason for this
> - * is because files tend to be in the same cylinder group as their
> - * parent directory, whereas directories tend not to be. Copying
> the
> - * files first reduces seeking.
> - */
According to cp -pRv, mastercmp() gets this perfectly backwards: cp
actually copies directories first. It seems to just randomize the
order of regular files; this is presumably because mastercmp() doesn't
distinguish between all pairs of different files and qsort() doesn't
preserve the original order.
> ...
> As quoted above, the comments in cp.c tell us the function
> mastercmp() is an attempt to improve performance based on
> knowing something about physical disks.
>
> This is an old optimization strategy (it's in the original
> version of cp.c). AFAIK, in the updated BSD filesystem,
> when we copy a file, we don't actually move the
> physical data block of the file but change the information in its
> inode such as the address of its data block and owner.
Copying still involves lots of physical i/o. The difference in
relatively recent versions of ffs is that it doesn't scatter the files
so much by switching the cylinder group too often. IIRC, it switched
for every directory.
> The next question is whether deleting mastercmp eliminates
> an optimization. Our testing shows the exact opposite,
> mastercmp is degrading performance. We did several experiments
> with cp -R to measure elapsed time on transfers between devices
> of differing file system types (to avoid UFS2 optimizations).
> Our results show removing mastercmp yields a small performance
> gain (note: we had no SCSI devices available, and second note:
> variability in file system performance seems dominated
> by other factors).
It would be interesting to know if mastercmp() works better if it does
what its comment says it does. I suspect that the backwardsness doesn't
make much difference, but is worse than it used to be because there
is now more competition for space in the same cylinder group. I think
benchmarks that don't descend into subdirs would show that using
mastercmp really is an optimization for that access pattern, but I
think that access pattern is relatively unusual. Optimizing for the
default fts order seems as good as anything.
> M. K. McKusick has indicated in seminars that modern disk drives
> lie to the driver about their physical layouts. The use of
> mastercmp in cp.c is a legacy optimization from a different
> era of disk technology. We recommend removing this call
> from cp.c to address 53475.
Large seeks (especially ones larger than the drive's cache) still
matter, and I think drivers rarely lie about these. cp's attempted
optimization is more about second-guessing what ffs does. I agree
that it shouldn't do this. The file system might not even be ffs.
Bruce
More information about the freebsd-bugs
mailing list