bin/143732: [mtree] mtree does a full hierarchy walk when requested to just update structure (-u -e)

David Naylor naylor.b.david at gmail.com
Wed Feb 10 19:30:02 UTC 2010


>Number:         143732
>Category:       bin
>Synopsis:       [mtree] mtree does a full hierarchy walk when requested to just update structure (-u -e)
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Feb 10 19:30:01 UTC 2010
>Closed-Date:
>Last-Modified:
>Originator:     David Naylor
>Release:        FreeBSD 8
>Organization:
Private
>Environment:
FreeBSD dragon.dg 8.0-STABLE FreeBSD 8.0-STABLE #0: Wed Jan 20 20:06:26 SAST 2010     root at dragon.dg:/tmp/usr/src/sys/GENERIC  amd64

>Description:
mtree does a full hierarchy walk when requested to update a (directory) structure.  This is unnecessary as mtree need only visit the directories or files in question.  This can have particular performance impact if the path's structure is extensive (i.e. /usr/local with 1000+ ports installed) and the structure to update is relatively short or when IO is slow (i.e. stacked unionfs).  

This has real world impact as this mtree command is run every time a port is installed.  
>How-To-Repeat:
The following script simulates slow IO (using stacked unionfs).  Note that the time mtree takes is similar to find when unpatched but a fraction of the time (between 60% to 80% quicker [1 - new/old]).  I expect this performance to be better when the path is populated with extra content (e.g. installed ports).  

The script should be run as root in an empty root (the number of overlayed unionfs are specified as the first command, please see http://lists.freebsd.org/pipermail/freebsd-current/2010-January/015040.html for possible problems [system freeze, kstack overflow]).  

On my computer with 155 overlayed unionfs the performance I got:
# ./unionfs_mtree.sh 155 | tail -n 1
mtree 155:       123.77 real         0.03 user       112.67 sys (unpatched)
mtree 155:        29.94 real         0.00 user        29.81 sys (patched)

The script (unionfs_mtree.sh):

#!/bin/sh
set -e

mount_unionfs() {

  count=1
  while [ $count -lt $1 ]
  do
    mount -rt unionfs -o noatime $count top
    count=$(($count + 1))
  done

}

umount_unionfs() {

  count=$1
  while [ $count -gt 1 ]
  do
    count=$(($count - 1))
    umount top
  done

}

mkdir -p top

[ $# -eq 1 ] || (echo "Please specify directory count!!!"; false)

for i in `jot $1`
do
  mount_unionfs $i
  mkdir -p $i
  mount -t unionfs -o noatime $i top

  set +e
  trap "true" INT TERM EXIT
  echo -n "find $i:  "
  time find top > /dev/null
  status=$?

  if [ $status -eq 0 ]
  then
    echo -n "mtree $i: "
    time /usr/sbin/mtree -U -f /usr/ports/Templates/BSD.local.dist -d -e -p top
    status=$?
  fi
  trap - INT TERM EXIT
  set -e

  umount top
  umount_unionfs $i
  [ $status -eq 0 ] || (echo "Terminated"; false)
done

>Fix:
The attached patch prevents a hierarchy walk if only an update is requested.  It simulated the structure returned from fts_read as required by compare.  

An improvement in performance could be implemented using chdir (as fts_* does) however that would require a pervasive change to existing code.  

Patch attached with submission follows:

--- /usr/src/usr.sbin/mtree/verify.c	2010-02-07 15:07:28.000000000 +0200
+++ verify.c	2010-02-07 15:04:10.000000000 +0200
@@ -50,17 +50,23 @@
 static NODE *root;
 static char path[MAXPATHLEN];
 
-static void	miss(NODE *, char *);
+static int	miss(NODE *, char *);
+static int	check(NODE *, char *);
 static int	vwalk(void);
 
 int
 mtree_verifyspec(FILE *fi)
 {
-	int rval;
+	int rval = 0;
 
 	root = mtree_readspec(fi);
-	rval = vwalk();
-	miss(root, path);
+	/*
+	 * No need to walk entire tree if we are only updating the structure
+	 * and extra files are ignored.
+	 */
+	if (!(uflag && eflag))
+		rval = vwalk();
+	rval |= miss(root, path);
 	return (rval);
 }
 
@@ -155,15 +161,47 @@
 	return (rval);
 }
 
-static void
+static int
+check(NODE *p, char *tail)
+{
+	FTSENT fts;
+	struct stat fts_stat;
+
+	strcpy(tail, p->name);
+
+	/*
+	 * It is assumed that compare() only requires fts_accpath and fts_statp
+	 * fields in the FTSENT structure.
+	 */
+	fts.fts_accpath = path;
+	fts.fts_statp = &fts_stat;
+
+	if (stat(path, fts.fts_statp))
+		return (0);
+
+	p->flags |= F_VISIT;
+	if ((p->flags & F_NOCHANGE) == 0 && compare(p->name, p, &fts))
+		return (MISMATCHEXIT);
+	else
+		return (0);
+
+	/*
+	 * tail is not restored to '\0' as the next time tail (or path) is used
+	 * is with a strcpy (thus overriding the '\0').  See +19 lines below.
+	 */
+}
+
+static int
 miss(NODE *p, char *tail)
 {
 	int create;
 	char *tp;
 	const char *type, *what;
-	int serr;
+	int serr, rval = 0;
 
 	for (; p; p = p->next) {
+		if (uflag && eflag)
+			rval |= check(p, tail);
 		if (p->flags & F_OPT && !(p->flags & F_VISIT))
 			continue;
 		if (p->type != F_DIR && (dflag || p->flags & F_VISIT))
@@ -256,4 +294,5 @@
 			(void)printf("%s: file flags not set: %s\n",
 			    path, strerror(errno));
 	}
+	return (rval);
 }


>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list