[Bug 199587] libc strncmp() performance

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Mon Apr 27 16:17:48 UTC 2015


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199587

--- Comment #1 from Eitan Adler <eadler at FreeBSD.org> ---
(adding to the bug)

>From bde:

This is basically confusing the compiler to produce not so good code
in a different way.

Your implementation is a bit cleaner since it doesn't arrange the source
code in a way that it thinks will be good for the object code.  This
results in it being slower for old compilers, faster for some in-between
compilers, and no different for new compilers.  However, all the C versions
are now faster than the asm versions on amd64 and i386 on 2 i7 CPUs.  I
added tests for the latter, and sprinkled some volatiles to stop the
compiler optimizing away the whole loop for the asm (libc) versions.

i386, 4790K @ 4.28GHz:
    gcc-3.3.3 -O (but no -march etc. complications):
        10.0 Gcycles  --  libc strncmp() (asm source, external linkage)
        10.1 Gcycles  --  libc strncmp() (copy of the C version)
        11.3 Gcycles  --  My Implementation

    gcc-3.3.3 -O2:
        12.0 Gcycles  --  libc strncmp() (asm source, external linkage)
         9.4 Gcycles  --  libc strncmp() (copy of the C version)
        10.2 Gcycles  --  My Implementation
    libc asm strncmp() really was made 20% slower by increasing the
    optimization level from -O to -O2, although strncmp() itself didn't
    change.  This might be due to the loop being poorly aligned.
    Tuning with -march might be needed to avoid 20% differences, so the
    mere 10% differences in these tests might be noise.  (I didn't bother
    giving many data data points, since nose from rerunning the tests is
    much smaller than 10-20% differences from tuning.)

    gcc-4.2.1 -O:
        11.4 Gcycles  --  libc strncmp() (asm source, external linkage)
        13.1 Gcycles  --  libc strncmp() (copy of the C version)
        12.1 Gcycles  --  My Implementation
    gcc-4.2.1 -O is much slower than gcc-3.3.3, but not so bad for your
    implementation.

    gcc-4.2.1 -O2:
        10.1 Gcycles  --  libc strncmp() (asm source, external linkage)
         9.5 Gcycles  --  libc strncmp() (copy of the C version)
         9.3 Gcycles  --  My Implementation
    gcc-4.2.1 is OK.

amd64, Xeon 5650 @ 2.67GHz:
    clang -O:
    The calls to *strcmp() were almost all optimized away.  I fixed
    this by replacing str1 in the call to str1 + v, where v is a
    volatile int with value 0.
        13.8 Gcycles  --  libc strncmp() (C source, external linkage)
        13.8 Gcycles  --  libc strncmp() (copy of the C version)
        13.8 Gcycles  --  My Implementation
    libc asm strncmp() is of interest here although it doesn't exist --
    if it existed, then it would be more bogus that on i386, since amd64
    doesn't run on the 1990 modem CPUs where the asm version was probably
    faster.  The asm i386 version as tuned for original i386's and barely
    changed since then.  Just as well, since it would be very messy with
    tuning for 10-20 generations of CPUs with several classes of CPU per
    generation.  amd64 libc string functions used to be missing all silly
    optimizations like this, but now optimizes the almost-never-used
    function stpcpy(), and its asm versions of strcat() and strcmp()
    are probably mistakes too.

i386, Xeon 5650 @ 2.67GHz:
    clang -O [-march=native makes no difference]
        12.0 Gcycles  --  libc strncmp() (asm source, external linkage)
        15.1 Gcycles  --  libc strncmp() (copy of the C version)
        11.5 Gcycles  --  My Implementation
    clang is even more confused by the copy of libc C strncmp() than
    gcc-4.2.1.

Bruce

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list