misc/58860: Improvement of radixsort() implementation
Markus Bjartveit Kruger
markusk at pvv.ntnu.no
Mon Nov 3 00:10:14 PST 2003
>Number: 58860
>Category: misc
>Synopsis: Improvement of radixsort() implementation
>Confidential: no
>Severity: non-critical
>Priority: low
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: change-request
>Submitter-Id: current-users
>Arrival-Date: Mon Nov 03 00:10:12 PST 2003
>Closed-Date:
>Last-Modified:
>Originator: Markus Bjartveit Kruger
>Release: FreeBSD 4.8-RELEASE i386
>Organization:
Programvareverkstedet
>Environment:
System: FreeBSD proto.pvv.ntnu.no 4.8-RELEASE FreeBSD 4.8-RELEASE #0: Tue Sep 16 21:03:28 CEST 2003 root at bacchus.pvv.ntnu.no:/local/obj/usr/src/sys/PROTO i386
>Description:
I've found a possible improvement to the implementation of radixsort()
located in /usr/src/lib/libc/stdlib/radixsort.c, which will make the
implementation more efficient when sorting data sets containing many
strings that share a common prefix.
The improvement consists in checking for bins where all strings have
the same character at the current position of the radix sort, and
moving on to the next character instead of needlessly shuffling the
strings in the bin around. When I used the modified radixsort() to
sort a web log file, which typically contains many strings with common
prefixes (IP addresses), I registered a speedup of approximately 15%.
On a constructed dataset where all strings had a long common prefix,
the speedup was 50%.
>How-To-Repeat:
Compile a program with the old and new version of radixsort() and compare
processing times. Remember to set appropriate optimization options, as
the efficiency gain can be lost otherwise.
>Fix:
--- radixsort-patch begins here ---
--- radixsort.c Tue Mar 11 12:39:57 1997
+++ radixsort.c.patch Fri Oct 31 12:55:10 2003
@@ -175,6 +175,17 @@
}
/*
+ * Special case: if all strings have the same
+ * character at position i, move on to the next
+ * character.
+ */
+ if (nc == 1 && count[bmin] == n) {
+ push(a, n, i+1);
+ nc = count[bmin] = 0;
+ continue;
+ }
+
+ /*
* Set top[]; push incompletely sorted bins onto stack.
* top[] = pointers to last out-of-place element in bins.
* count[] = counts of elements in bins.
--- radixsort-patch ends here ---
>Release-Note:
>Audit-Trail:
>Unformatted:
More information about the freebsd-bugs
mailing list