bin/51151: du hardlinkmatching is slow - fix included
Peter van Dijk
peter at dataloss.nl
Sat Apr 19 00:50:12 PDT 2003
>Number: 51151
>Category: bin
>Synopsis: du hardlinkmatching is slow - fix included
>Confidential: no
>Severity: non-critical
>Priority: low
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: change-request
>Submitter-Id: current-users
>Arrival-Date: Sat Apr 19 00:50:10 PDT 2003
>Closed-Date:
>Last-Modified:
>Originator: Peter van Dijk
>Release: FreeBSD 4.7-RELEASE-p6 i386
>Organization:
-
>Environment:
System: FreeBSD useful.dataloss.nl 4.7-RELEASE-p6 FreeBSD 4.7-RELEASE-p6 #3: Fri Feb 28 21:41:55 CET 2003 root at useful.home.dataloss.nl:/home2/usr2/obj/home2/usr2/src/sys/USEFUL i386
>Description:
Running du -hs on a set of directories with lots of hardlinks between them
is horribly slow. The reason is that the inode-numbers are stored in a flat
array which is searched in a linear way.
>How-To-Repeat:
Create lots of hardlinks and run du on your directory.
>Fix:
This patch modifies du to use the balanced tree methods provided by libisc.
On my testset it reduced du runtime from around 4 hours to just under 2
minutes.
I'm sorry it depends on libisc, but I could not find any suitable
datastructures in the libc. I am open to suggestions.
--- /usr/src/usr.bin/du/du.c Wed Sep 19 21:19:48 2001
+++ du.c Thu Apr 17 15:16:17 2003
@@ -63,6 +63,7 @@
#include <string.h>
#include <sysexits.h>
#include <unistd.h>
+#include <isc/tree.h>
#define KILO_SZ(n) (n)
#define MEGA_SZ(n) ((n) * (n))
@@ -104,6 +105,8 @@
void ignoreclean __P((void));
int ignorep __P((FTSENT *));
+tree *linktree;
+
int
main(argc, argv)
int argc;
@@ -232,6 +235,8 @@
blocksize /= 512;
rval = 0;
+
+ tree_init(&linktree);
if ((fts = fts_open(argv, ftsoptions, NULL)) == NULL)
err(1, "fts_open");
@@ -308,36 +313,33 @@
exit(rval);
}
-
-typedef struct _ID {
- dev_t dev;
- ino_t inode;
-} ID;
-
-
int
linkchk(p)
FTSENT *p;
{
- static ID *files;
- static int maxfiles, nfiles;
- ID *fp, *start;
ino_t ino;
dev_t dev;
+ char *s;
+
+ if((s=malloc(32))==0)
+ err(1, "malloc");
ino = p->fts_statp->st_ino;
dev = p->fts_statp->st_dev;
- if ((start = files) != NULL)
- for (fp = start + nfiles - 1; fp >= start; --fp)
- if (ino == fp->inode && dev == fp->dev)
- return (1);
-
- if (nfiles == maxfiles && (files = realloc((char *)files,
- (u_int)(sizeof(ID) * (maxfiles += 128)))) == NULL)
- errx(1, "can't allocate memory");
- files[nfiles].inode = ino;
- files[nfiles].dev = dev;
- ++nfiles;
+
+ snprintf(s, 32, "%u:%u", dev, ino);
+
+ if(tree_srch(&linktree, strcmp, s)==0)
+ {
+ tree_add(&linktree, strcmp, s, 0);
+ return(0);
+ }
+ else
+ {
+ free(s);
+ return(1);
+ }
+
return (0);
}
>Release-Note:
>Audit-Trail:
>Unformatted:
More information about the freebsd-bugs
mailing list