bin/51151: du hardlinkmatching is slow - fix included

Peter van Dijk peter at dataloss.nl
Sat Apr 19 00:50:12 PDT 2003


>Number:         51151
>Category:       bin
>Synopsis:       du hardlinkmatching is slow - fix included
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Sat Apr 19 00:50:10 PDT 2003
>Closed-Date:
>Last-Modified:
>Originator:     Peter van Dijk
>Release:        FreeBSD 4.7-RELEASE-p6 i386
>Organization:
-
>Environment:
System: FreeBSD useful.dataloss.nl 4.7-RELEASE-p6 FreeBSD 4.7-RELEASE-p6 #3: Fri Feb 28 21:41:55 CET 2003 root at useful.home.dataloss.nl:/home2/usr2/obj/home2/usr2/src/sys/USEFUL i386


>Description:

Running du -hs on a set of directories with lots of hardlinks between them
is horribly slow. The reason is that the inode-numbers are stored in a flat
array which is searched in a linear way.

>How-To-Repeat:

Create lots of hardlinks and run du on your directory.

>Fix:

This patch modifies du to use the balanced tree methods provided by libisc.
On my testset it reduced du runtime from around 4 hours to just under 2
minutes.

I'm sorry it depends on libisc, but I could not find any suitable
datastructures in the libc. I am open to suggestions.

--- /usr/src/usr.bin/du/du.c	Wed Sep 19 21:19:48 2001
+++ du.c	Thu Apr 17 15:16:17 2003
@@ -63,6 +63,7 @@
 #include <string.h>
 #include <sysexits.h>
 #include <unistd.h>
+#include <isc/tree.h>
 
 #define	KILO_SZ(n) (n)
 #define	MEGA_SZ(n) ((n) * (n))
@@ -104,6 +105,8 @@
 void		ignoreclean __P((void));
 int		ignorep __P((FTSENT *));
 
+tree *linktree;
+
 int
 main(argc, argv)
 	int argc;
@@ -232,6 +235,8 @@
 	blocksize /= 512;
 
 	rval = 0;
+
+	tree_init(&linktree);
 	
 	if ((fts = fts_open(argv, ftsoptions, NULL)) == NULL)
 		err(1, "fts_open");
@@ -308,36 +313,33 @@
 	exit(rval);
 }
 
-
-typedef struct _ID {
-	dev_t	dev;
-	ino_t	inode;
-} ID;
-
-
 int
 linkchk(p)
 	FTSENT *p;
 {
-	static ID *files;
-	static int maxfiles, nfiles;
-	ID *fp, *start;
 	ino_t ino;
 	dev_t dev;
+	char *s;
+
+	if((s=malloc(32))==0)
+		err(1, "malloc");
 
 	ino = p->fts_statp->st_ino;
 	dev = p->fts_statp->st_dev;
-	if ((start = files) != NULL)
-		for (fp = start + nfiles - 1; fp >= start; --fp)
-			if (ino == fp->inode && dev == fp->dev)
-				return (1);
-
-	if (nfiles == maxfiles && (files = realloc((char *)files,
-	    (u_int)(sizeof(ID) * (maxfiles += 128)))) == NULL)
-		errx(1, "can't allocate memory");
-	files[nfiles].inode = ino;
-	files[nfiles].dev = dev;
-	++nfiles;
+
+	snprintf(s, 32, "%u:%u", dev, ino);
+
+	if(tree_srch(&linktree, strcmp, s)==0)
+	{
+		tree_add(&linktree, strcmp, s, 0);
+		return(0);
+	}
+	else
+	{
+		free(s);
+		return(1);
+	}
+
 	return (0);
 }
 

>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list