[Bug 287089] [libc] qsort insertion sorts all pre-partitioned arrays, allowing trivial n^2 inputs

From: <bugzilla-noreply_at_freebsd.org>
Date: Tue, 27 May 2025 07:12:08 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=287089

            Bug ID: 287089
           Summary: [libc] qsort insertion sorts all pre-partitioned
                    arrays, allowing trivial n^2 inputs
           Product: Base System
           Version: 14.2-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: bin
          Assignee: bugs@FreeBSD.org
          Reporter: raspberryvivigrette@gmail.com

I found this bug on the Android NDK but since the qsort implementation comes
from FreeBSD and I can confirm it works on FreeBSD, I am reporting this
upstream first.

Basically, if qsort partitions an array without needing to swap anything, it
will do a completely blind insertion sort.

While this gives it an O(n) best case, the O(n^2) worst case goes from
requiring a complicated "quicksort killer" to simply reversing each half of the
array and swapping a few things to break the ninther. However, being insertion
sort, anything that isn't mostly sorted will be O(n^2).

For N = 10, it would look like this.

0 4 3 2 1 5 9 8 7 6

With large arrays, this pattern can stall for seconds (or even a minute if you
have a bad setup like a slow VM), compared to milliseconds in its usual
runtime. 

The easiest solution would be to change it to limit the number of swaps it can
do in this insertion sort pass.

However, if the goal is to harden qsort, it would probably be better to just
switch to pdqsort (like Golang and Rust) which has this optimization, as well
as other optimizations, to guarantee O(n log n) complexity and an O(n) best
case.

Minimal example and benchmarks:

```
#include <stdlib.h>
#include <stdio.h>
#include <time.h>

#define N 100000

static size_t ncmp = 0;
static int buf[N];
static int icmp(const void *a, const void *b)
{
    int x = *(const int *)a;
    int y = *(const int *)b;
    // (I first created this test on Android which has no qsort_r)
    ++ncmp;
    return (x > y) - (x < y);
}

int main()
{
    // Start with a partitioned array of reversed integers
    for (size_t i = 0; i < N; i++) {
        if (i > N / 2) {
            buf[i] = N - (i - N / 2);
        } else {
            buf[i] = (N / 2) - i;
        }
    } 
    // Add some quick changes to make it so the ninther will pick the middle
    buf[0] = 0;
    buf[N / 2] = N / 2;
    // The array is already partitioned, so qsort will blindly start insertion
sorting it
    // due to the "swap_cnt" check
    // since the array is mostly reversed, it will result in a near-worst case
for insertion sort
    double start = clock();
    qsort(buf, N, sizeof(int), icmp);
    double end = clock();
    printf("qsort: %zu compares, %lf seconds\n", ncmp, (end - start) /
CLOCKS_PER_SEC);
}
```

BSD/NDK does 250010009 comparisons, which is almost exactly insertion sort's
O(n^2/4). 

glibc, which uses mergesort under the hood, does 853930 comparisons.


Simply removing the buf[0] = 0 line will break the insertion sort optimization
(since buf[0] will be equal to the pivot) and result in 525037 comparisons.


Benchmark:

On a QEMU KVM VM on a potato laptop with a Core i7-8650u, 4 cores and 4 GB RAM
allocated, running Arch Linux:

FreeBSD 14.2 RELEASE (VM image):

qsort: 2500100009 compares, 65.835938 seconds

Arch Linux, glibc 2.41:

qsort: 853930 compares, 0.016993 seconds



On a Pixel 6a running Android 15 which uses FreeBSD's qsort and isn't limited
by slow virtualization:

qsort: 2500100009 compares, 18.901680 seconds

Running glibc under proot:

qsort: 853930 compares, 0.009387 seconds

-- 
You are receiving this mail because:
You are the assignee for the bug.