bin/53475: cp(1) copies files in reverse order to destination

Bruce Evans bde at zeta.org.au
Wed Apr 28 00:40:17 PDT 2004


The following reply was made to PR bin/53475; it has been noted by GNATS.

From: Bruce Evans <bde at zeta.org.au>
To: "Dorr H. Clark" <dclark at applmath.scu.edu>
Cc: freebsd-gnats-submit at freebsd.org
Subject: Re: bin/53475: cp(1) copies files in reverse order to destination
Date: Wed, 28 Apr 2004 17:37:25 +1000 (EST)

 On Tue, 27 Apr 2004, Dorr H. Clark wrote:
 
 > ...
 >  -/*
 >  - * mastercmp --
 >  - *     The comparison function for the copy order.  The order is to
 >  copy
 >  - *     non-directory files before directory files.  The reason for this
 >  - *     is because files tend to be in the same cylinder group as their
 >  - *     parent directory, whereas directories tend not to be.  Copying
 >  the
 >  - *     files first reduces seeking.
 >  - */
 
 According to cp -pRv, mastercmp() gets this perfectly backwards: cp
 actually copies directories first.  It seems to just randomize the
 order of regular files; this is presumably because mastercmp() doesn't
 distinguish between all pairs of different files and qsort() doesn't
 preserve the original order.
 
 > ...
 >  As quoted above, the comments in cp.c tell us the function
 >  mastercmp() is an attempt to improve performance based on
 >  knowing something about physical disks.
 >
 >  This is an old optimization strategy (it's in the original
 >  version of cp.c).  AFAIK, in the updated BSD filesystem,
 >  when we copy a file, we don't actually move the
 >  physical data block of the file but change the information in its
 >  inode such as the address of its data block and owner.
 
 Copying still involves lots of physical i/o.  The difference in
 relatively recent versions of ffs is that it doesn't scatter the files
 so much by switching the cylinder group too often.  IIRC, it switched
 for every directory.
 
 >  The next question is whether deleting mastercmp eliminates
 >  an optimization.  Our testing shows the exact opposite,
 >  mastercmp is degrading performance.  We did several experiments
 >  with cp -R to measure elapsed time on transfers between devices
 >  of differing file system types (to avoid UFS2 optimizations).
 >  Our results show removing mastercmp yields a small performance
 >  gain (note: we had no SCSI devices available, and second note:
 >  variability in file system performance seems dominated
 >  by other factors).
 
 It would be interesting to know if mastercmp() works better if it does
 what its comment says it does.  I suspect that the backwardsness doesn't
 make much difference, but is worse than it used to be because there
 is now more competition for space in the same cylinder group.  I think
 benchmarks that don't descend into subdirs would show that using
 mastercmp really is an optimization for that access pattern, but I
 think that access pattern is relatively unusual.  Optimizing for the
 default fts order seems as good as anything.
 
 >  M. K. McKusick has indicated in seminars that modern disk drives
 >  lie to the driver about their physical layouts.  The use of
 >  mastercmp in cp.c is a legacy optimization from a different
 >  era of disk technology.  We recommend removing this call
 >  from cp.c to address 53475.
 
 Large seeks (especially ones larger than the drive's cache) still
 matter, and I think drivers rarely lie about these.  cp's attempted
 optimization is more about second-guessing what ffs does.  I agree
 that it shouldn't do this.  The file system might not even be ffs.
 
 Bruce


More information about the freebsd-bugs mailing list