svn commit: r213281 - head/lib/libc/amd64/gen

Thu Sep 30 03:46:22 UTC 2010

On Wed, 29 Sep 2010, Jung-uk Kim wrote:

> On Wednesday 29 September 2010 05:20 pm, Dimitry Andric wrote:
>> Log:
>>   Apply the same workaround for clang to amd64's version of ldexp.c
>> (as in r212976): order the incoming arguments to fscale as st(0),
>> st(1), and mark temp2 volatile (only in case of compilation with
>> clang) to force clang to pop it correctly.  No binary change when
>> compiled with gcc.
>
> Actually the binary slightly changes when compiled with gcc:
> 
> %diff -u ldexp-r1.14.c ldexp-r1.15.c
> --- ldexp-r1.14.c       2010-09-29 17:44:45.000000000 -0400
> +++ ldexp-r1.15.c       2010-09-29 17:45:10.000000000 -0400
> @@ -34,7 +34,9 @@
> static char sccsid[] = "@(#)ldexp.c    8.1 (Berkeley) 6/4/93";
> #endif /* LIBC_SCCS and not lint */
> #include <sys/cdefs.h>
> -__FBSDID("$FreeBSD: src/lib/libc/amd64/gen/ldexp.c,v 1.14 2007/01/09 00:38:24 imp Exp $");
> +__FBSDID("$FreeBSD: src/lib/libc/amd64/gen/ldexp.c,v 1.15 2010/09/29 21:20:29 dim Exp $");
> ...
> With -O1 and above, the FXCH completely disappears from the old
> version by rearranging stack operations, which is even more
> interesting.
>
> Don't get me wrong, both work fine.  FYI, verified with this:

This file probably shouldn't exist, especially on amd64.  There are 4 or 5
versions of ldexp(), and this file implements what seems to be the worst
one, even without the bug.

First, it shouldn't exist since it is a libm function.  It exists for the
historical reason that its object file has always been in libc.  This
causes organizational problems.

The second version is in fdlibm.  This wasn't imported into FreeBSD.  It
calls scalbn() after checking some cases.  I think it shouldn't check
anything.  In FreeBSD it could be a weak alias to scalbn().

The third version is in fdlibm.  This one is named scalbn().  FreeBSD has
it.  FreeBSD aliases ldexpl() to scalbn() iff long doubles are the same as
doubles.  FreeBSD also has scalbnf().  This came from NetBSD/Cygnus's
extension of fdlibm.  FreeBSD aliases ldexpf() to scalbnf() (or is it
the other way?).

The fourth version is in the FreeBSD arch-dependent directories of
lib/msun for at least amd64 and i386.  These are also named scalbn().
These aren't in fdlibm, but came from NetBSD.  These are written in
non-inline asm and are similar to the ones in libc.  They are a couple
of instructions shorter, due to never using a frame pointer (unless
profiling) and avoiding an fxch or two.  They aren't aliased to aything,
and don't have float versions.

The fifth version, which might not exist, is gcc's builtin.  I think it
doesn't really exist, but gcc says it has a builtin ldexp() and I had to
fight with this to test this.  gcc normally made the dubious optimization
of moving ldexp() out of a test loop.  But ldexp() has side effects.

Testing indicates that the fdlibm C version is 2.5 times faster than the
asm versions on amd64 on a core2 (ref9), while on i386 the C version is
only 1.5 times faster.  The C code is a bit larger so benefits more from
being called from a loop.  The asm code uses a slow i387 instruction, and
on i387 it hhs to do expensive moves from xmm registers to i387 ones and
back.

Times for 100 million calls:

     amd64 libc ldexp:      3.18 seconds
     amd64 libm asm scalbn: 2.96
     amd64 libm C scalbn:   1.30
     i386  libc ldexp:      3.13
     i386  libm asm scalbn: 2.86
     i386  libm C scalbn:   2.11

All compiled with -O -static -fno-unit-at-a-time.  The test loop was

% 	double d, rv;
% 
% 	rv = 0;
% 	for (d = 0; d < 100000000; d++)
% 		rv += X(d, 3);

where X = ldexp or scalbn.  gcc now optimizes away loops like this too
much.  I checked that it was called 100 million times in most cases.

Bruce