Multithreaded qsort(3)

Jeremy Messenger mezz7 at cox.net
Thu Mar 15 13:34:11 UTC 2007


On Thu, 15 Mar 2007 03:42:21 -0500, Diomidis Spinellis <dds at aueb.gr> wrote:

> Greetings,
>
> I've added multhread support to our qsort implementation.  You can see  
> the code at <http://people.freebsd.org/~dds/qsort_mt.c> and some  
> performance figures at the end of this email.  I propose to add it to  
> our libc.  I need your opinions on the interface, and also any comments  
> you may have on the code.  I see the following design dimensions:

That remind me other days when I have readed in one of blog that have some  
good info about sorts. I will posting a few of links in here in case if  
anyone is insteresting to read. ;-)

Quick Sort (qsort()):
http://jeffreystedfast.blogspot.com/2007/03/quick-sort.html

25 Byte Sort Routine:
http://jeffreystedfast.blogspot.com/2007/03/25-byte-sort-routine.html

Merge Sort:
http://jeffreystedfast.blogspot.com/2007/02/merge-sort.html

A lot of more sorts:
Shell Sort
Binary Insertion Sort
Insertion Sort
Bubble Sort

http://jeffreystedfast.blogspot.com/search/label/sorting

Cheers,
Mezz

> 1. Naming and availability
> --------------------------
> - Option 1.1: Replace qsort with this implementation
> * Pros: Programs gain performance without any source code change;  
> abstraction (number of processors/cores should be invisible, just as the  
> CPU architecture or disk drive interface)
> * Cons: May confuse multithreaded programs; programs require linking  
> with pthreads library; programs whose compare function is not thread  
> safe will break
>
> - Option 1.2: Name it qsort_mt
> * Pros: POLA; programs can tune the call according to their need
> * Cons: Programs require source code changes; we violate abstraction;  
> namespace polution
>
> My proposal: option 1.2: name it qsort_mt and make it available in a  
> separate library (libc_mt).
>
>
> 2. Interface
> ------------
> - Option 2.1: qsort compatible
> * Pros: Drop in replacement; doesn't need programmer understanding;  
> compatible with option 1.1
> * Cons: Can't be tuned for a specific program or workload
>
> - Option 2.2: Add nthreads parameter, specifying the maximum number of  
> threads
> * Pros: Multithreaded programs can tune load balancing; all programs can  
> optimise nthreads according to the workload characteristics.
> * Cons: Libc routines are generaly expected to Do he Right Thing,  
> without handholding; must agree on mechanism for determining the number  
> of threads; the benefits of tuning are unlikely to be dramatic.
>
> - Option 2.3: Add forkelem, specifying minimum number of elements for a  
> new thread (below forkelem, I use single-threaded qsort).
> * Pros: programs can optimise nthreads according to the workload  
> characteristics.
> * Cons: Libc routines are generaly expected to Do he Right Thing,  
> without handholding; the benefits of tuning are unlikely to be dramatic.
>
> - Option 2.4: Have the library tune itsself for a given system or  
> individual programs by measuring different values
> * Pros: higher performance when the tuned values are reused
> * Cons: Lack of prior art; performance drop when determining tuning  
> values; security considerations if values are cached; complexity
>
> (Options 2.2 and 2.3 are orthogonal; options 2.1 and 2.4 are orthogonal)
>
> My proposal: go for 2.1, based on the way C library routines work.
>
> 3. Determining the number of threads
> ------------------------------------
> - Option 3.1: NPROCS environment variable
> * Pros: prior art: CrayOS, BSD/OS; users can tune it.
> * Cons: May be too blunt; requires setting by user or administrator
>
> - Option 3.[234]: hw.ncpu or kern.smp.cpus sysctl, or pmc_ncpu()
> * Pros: automatic, right for the underlying hardware
> * Cons: even blunter
>
> - Option 3.5: Add library interface for the above as int nthreads_mt()
> * Pros: we abstract policy; programs can replace it; reusability;  
> extensibility.
> * Cons: namespace polution; it is not clear that a single number is  
> correct for all uses of a program.
>
> (All the above options are orthogonal)
>
> My proposal: Add library interface using NPROCS and falling back to a  
> sysctl variable.  This interface can be extended in the future.
>
> Performance figures
> -------------------
> Reported times are elapsed, user, and system seconds.
>
> $ uname -a
> FreeBSD sledge.freebsd.org 7.0-CURRENT FreeBSD 7.0-CURRENT #924: Wed Mar  
> 14 14:25:07 UTC 2007  
> root at sledge.freebsd.org:/h/src/sys/amd64/compile/SLEDGE  amd64
>
> $ qsort_mt -?
> qsort_mt: option requires an argument -- h
> usage: qsort_mt [-stv] [-f forkelements] [-h threads] [-n elements]
>          -l      Run the libc version of qsort
>          -s      Test with 20-byte strings, instead of integers
>          -t      Print timing results
>          -v      Verify the integer results
> Defaults are 1e7 elements, 2 threads, 100 fork elements
>
>
> $ gcc -DTEST -O qsort_mt.c -o qsort_mt -lthr -lpmc
>
> $ qsort_mt -t -h 2 -n 100000000 -l	# Integers; qsort(3)
> 46.5 46.2 0.314
> $ ./qsort_mt  -t -h 2 -n 100000000	# Integers; qsort_mt
> 27.5 46.3 0.301
> $ qsort_mt -t -h 2 -n 10000000 -s -l	# Strings; qsort(3)
> 41.3 40.1 1.2
> $ ./qsort_mt  -t -h 2 -n 10000000 -s	# Strings; qsort_mt
> 26.7 40.1 1.16
>
> $ gcc -DTEST -O qsort_mt.c -o qsort_mt -lpthread -lpmc
> $ qsort_mt -t -h 2 -n 100000000		# Integers; qsort_mt
> 28 47.1 0.368
> $ qsort_mt -t -h 2 -n 10000000 -s	# Strings; qsort_mt
> 27.1 40.6 1.06
>
>
> Thanks,
>
> Diomidis - dds@
> http://www.spinellis.gr


-- 
mezz7 at cox.net  -  mezz at FreeBSD.org
FreeBSD GNOME Team  -  FreeBSD Multimedia Hat (ports, not src)
http://www.FreeBSD.org/gnome/  -  gnome at FreeBSD.org
http://wiki.freebsd.org/multimedia  -  multimedia at FreeBSD.org


More information about the freebsd-threads mailing list