Multithreaded qsort(3)
Diomidis Spinellis
dds at aueb.gr
Thu Mar 15 12:12:13 UTC 2007
Greetings,
I've added multhread support to our qsort implementation. You can see
the code at <http://people.freebsd.org/~dds/qsort_mt.c> and some
performance figures at the end of this email. I propose to add it to
our libc. I need your opinions on the interface, and also any comments
you may have on the code. I see the following design dimensions:
1. Naming and availability
--------------------------
- Option 1.1: Replace qsort with this implementation
* Pros: Programs gain performance without any source code change;
abstraction (number of processors/cores should be invisible, just as the
CPU architecture or disk drive interface)
* Cons: May confuse multithreaded programs; programs require linking
with pthreads library; programs whose compare function is not thread
safe will break
- Option 1.2: Name it qsort_mt
* Pros: POLA; programs can tune the call according to their need
* Cons: Programs require source code changes; we violate abstraction;
namespace polution
My proposal: option 1.2: name it qsort_mt and make it available in a
separate library (libc_mt).
2. Interface
------------
- Option 2.1: qsort compatible
* Pros: Drop in replacement; doesn't need programmer understanding;
compatible with option 1.1
* Cons: Can't be tuned for a specific program or workload
- Option 2.2: Add nthreads parameter, specifying the maximum number of
threads
* Pros: Multithreaded programs can tune load balancing; all programs can
optimise nthreads according to the workload characteristics.
* Cons: Libc routines are generaly expected to Do he Right Thing,
without handholding; must agree on mechanism for determining the number
of threads; the benefits of tuning are unlikely to be dramatic.
- Option 2.3: Add forkelem, specifying minimum number of elements for a
new thread (below forkelem, I use single-threaded qsort).
* Pros: programs can optimise nthreads according to the workload
characteristics.
* Cons: Libc routines are generaly expected to Do he Right Thing,
without handholding; the benefits of tuning are unlikely to be dramatic.
- Option 2.4: Have the library tune itsself for a given system or
individual programs by measuring different values
* Pros: higher performance when the tuned values are reused
* Cons: Lack of prior art; performance drop when determining tuning
values; security considerations if values are cached; complexity
(Options 2.2 and 2.3 are orthogonal; options 2.1 and 2.4 are orthogonal)
My proposal: go for 2.1, based on the way C library routines work.
3. Determining the number of threads
------------------------------------
- Option 3.1: NPROCS environment variable
* Pros: prior art: CrayOS, BSD/OS; users can tune it.
* Cons: May be too blunt; requires setting by user or administrator
- Option 3.[234]: hw.ncpu or kern.smp.cpus sysctl, or pmc_ncpu()
* Pros: automatic, right for the underlying hardware
* Cons: even blunter
- Option 3.5: Add library interface for the above as int nthreads_mt()
* Pros: we abstract policy; programs can replace it; reusability;
extensibility.
* Cons: namespace polution; it is not clear that a single number is
correct for all uses of a program.
(All the above options are orthogonal)
My proposal: Add library interface using NPROCS and falling back to a
sysctl variable. This interface can be extended in the future.
Performance figures
-------------------
Reported times are elapsed, user, and system seconds.
$ uname -a
FreeBSD sledge.freebsd.org 7.0-CURRENT FreeBSD 7.0-CURRENT #924: Wed Mar
14 14:25:07 UTC 2007
root at sledge.freebsd.org:/h/src/sys/amd64/compile/SLEDGE amd64
$ qsort_mt -?
qsort_mt: option requires an argument -- h
usage: qsort_mt [-stv] [-f forkelements] [-h threads] [-n elements]
-l Run the libc version of qsort
-s Test with 20-byte strings, instead of integers
-t Print timing results
-v Verify the integer results
Defaults are 1e7 elements, 2 threads, 100 fork elements
$ gcc -DTEST -O qsort_mt.c -o qsort_mt -lthr -lpmc
$ qsort_mt -t -h 2 -n 100000000 -l # Integers; qsort(3)
46.5 46.2 0.314
$ ./qsort_mt -t -h 2 -n 100000000 # Integers; qsort_mt
27.5 46.3 0.301
$ qsort_mt -t -h 2 -n 10000000 -s -l # Strings; qsort(3)
41.3 40.1 1.2
$ ./qsort_mt -t -h 2 -n 10000000 -s # Strings; qsort_mt
26.7 40.1 1.16
$ gcc -DTEST -O qsort_mt.c -o qsort_mt -lpthread -lpmc
$ qsort_mt -t -h 2 -n 100000000 # Integers; qsort_mt
28 47.1 0.368
$ qsort_mt -t -h 2 -n 10000000 -s # Strings; qsort_mt
27.1 40.6 1.06
Thanks,
Diomidis - dds@
http://www.spinellis.gr
More information about the freebsd-arch
mailing list