qsort switching to insertsort
Tristan Verniquet
tris_vern at hotmail.com
Sat Nov 26 10:28:00 UTC 2016
We recently hit an issue with qsort where it was taking 2 minutes to sort 500k entries. Changing to mergesort was comparatively instantaneous (which is what was expected). This was trusted data and shouldn't have been failing due to pivot choice.
After investigation, we discovered we'd been caught by a quirk of FreeBSD's qsort where it will switch to insertsort (even for the whole array) if a pass matches a very simplistic heuristic of doing no swaps.
This has previously been written up about in the following links:
http://calmerthanyouare.org/2013/05/31/qsort-shootout.html
and
http://calmerthanyouare.org/2014/06/11/algorithmic-complexity-attacks-and-libc-qsort.html
The articles above focus on the security aspect of quicksort, and that all implementations are vulnerable, however what seems to be missed in the FreeBSD case is how simple it is to hit this (really quite bad) worst case scenario with everyday data, as we did. This discovery is enough for us to not want to use this version of qsort.
FreeBSD does provide heapsort and mergesort as alternatives, but not mergesort_r(). Also, as a developer, having these options gives the impression that choosing qsort will give me a quicksort-like implementation with the trade-offs that come with that, but it is not obvious that I might be given insertsort.
The easiest way forward for us is probably to comment out the offending code.
But I haven't been able to find much discussion on it. I'm not sure how well known the quirk is. I'm not sure of the rationality for it in the first place (obviously a speedup, but whether it was considered alongside the downfalls), or what other peoples opinions are. So I thought I'd ask.
I've done up a simple test case that is similar to our original usage that shows why we were likely to always trigger an insertsort. It has 2 sort fields in a 1->N relationship with the second fields sorted within the first, but the topmost fields slightly out of order. I imagine there are lots of other "almost sorted" scenarios that could end up being quite bad.
http://tverniquet.com/whirlpool/test_qsort/test_qsort.c
Regards,
Tristan
More information about the freebsd-hackers
mailing list