# long double broken on i386?

Mon Oct 1 17:14:59 PDT 2007

```On Mon, Oct 01, 2007 at 07:56:28PM +1000, Bruce Evans wrote:
> On Fri, 28 Sep 2007, Steve Kargl wrote:
>
> >So, in my never ending crusade to implement missing
> >C99 math functions, I decided to tackle sinl, cosl,
> >and tanl.  I have implementations that use argument
> >reduction from k_rem_pio2.c and Chebyshev interpolation
> >for the trig functions over the range [0,pi/4].
> >
> >On amd64, these routines give <= 1/2 ULP for the various
> >values I've tested in the reduced range.  Now, if I test
>
> I doubt this :-).  Getting <= 1 ULP is hard; getting <= 1/2 + 0.001
> ULPs is harder, and getting <= 1/2 ULPs without using an enormous
> amount of extra precision is a research problem.

It appears that I was not clear in what I had intended to write.
Let's try again.  I've implemented the necessary argument
reduction using k_rem_pio2.c for a 64-bit long double.  I'm
in k_rem_pio2.c do not discuss the accuracy.  It is noted
that giving the reduced argument to one of my algorithms
does not sudden recover any lost precision.

However, given a reduced value in the range [0,pi/4], the
algorithms that I have for sinl(), cosl(), and tanl() over
this range give <= 1/2 ULP "for the various values I've
tested."  Note, I did not claim to test *all* values in
[0,pi/4].  I've only tested the algorithms with several 10
of millions of randomly generated values.

It should also be noted that I'm using Goldberg's definition
of ULP where the "exact" result is computed via MPFR with
512 bits of precision and then the mpfr value is converted
to long double.

/*
* y is the approximate value.  z is the long double result
* of converting the 512 bit MPFR number in round-to-nearest
* mode.
*/
long double
ulpl(long double y, long double z)
{
int n;
long double f;
f = frexpl(y, &n);
f = fabsl(f - ldexpl(z, -n));
f = ldexpl(f, LDBL_MANT_DIG - 1);
return (f);
}

> >these on i386, I see between 400 and 1200 ULP.  I spent
> >a week or so trying to understand the descrepancy and "fix"
> >the problem using a double-double precision arithemetic.
> >
> >I finally found the problem! /usr/include/float.h on i386
> >is lying about its numerical representation of long double.
>
> This problem is well known.  I must have referred to it many times
> in old mail to you.

You've told me many things about floating point in FreeBSD.
Unfortunately, I have a small brain with only so much capacity.

> FreeBSD uses my old workaround for compiler bugs in double
> precision, of using a rounding precision of double for everything.

I remembered the workaround this weekend, and in fact using
fpsetprec to set 64-bit precision on my i386 system generated
results consistent with my amd64 system.

> Unfortunately, this is still necessary because the
> compiler bugs are still there.  This mostly breaks long double
> precision, and it doesn't fix float precision, but double precision
> is more important.  libm, in particular k_rem_pio2f.c (which is no
> longer used) needs tricky fixes to work around the bugs for float
> precision.  The fixes have mostly not been merged to the double
> precision functions -- we just depend on double precision mostly
> working.
>
> >In particular, at least these 3 values appear to be wrong
> >
> >#define LDBL_MANT_DIG   64
> >#define LDBL_EPSILON    1.0842021724855044340E-19L
> >#define LDBL_DIG        18
>
> These were broken in rev.1.9 in 2002.  The "slight rounding issues with
> long doubles" referred to in the log message are that long doubles are
> rounded to double precision when they are operated on, unless you change
> the rounding precision to long double using fpsetprec() or fesetprec().
> Even if you change the rounding precision, long doubles can only appear
> to work, but not actually work, unless you fix the compiler bugs or
> work around them in libraries and in your application.  It isn't clear
> if the "appear to work" in the log messages is with or without the change
> to the rounding precision.  As mentioned in the log message, printf() in
> 2002 didn't support printing long doubles at all, so it was hard to see
> loss of precision in printf output.  Now printf supports long doubles and
> %a, so it is easy to see lost precision using it.  E.g.:
>
> %%%
> #include <math.h>
> #include <stdio.h>
>
> int
> main(void)
> {
> 	printf("    sin(1) = %La\n", (long double)(double)sin(1));
> 	printf("2 * sin(1) = %La\n", (long double)(double)(2 * sin(1)));
> 	return (0);
> }
> %%%
>
> Output on i386:
>
> %%%
>     sin(1) = 0xd.76aa47848677021p-4
> 2 * sin(1) = 0xd.76aa47848677p-3
> %%%
>
> This exhibits a library bug, several compiler bugs, and the lost precision
> from my rounding precision hack:
>
> o library bug: sin() returns extra precision, despite its prototype saying
>   that it returns double.  Well, I used to think that this is a bug.  I
>   only leared a few months ago that it is a feature, since it matches
>   the corresponding feature for a function implemented in C.  On i386,
>   sin() is implemented in asm.  It just uses the hardware to evaluate
>   sin() in long double precision, and returns the long double result of
>   that.  The corresponding hardware feature is that in a C function that
>   does similar calculations is permitted to return extra precision:
>
>     double x, y;
>     long double foo() { return x+y; }
>
>   This is permitted to do the following:
>   - evaluate x+y in extra precision
>   - return the extra precision, since unlike casts and assignments, the
>     return operation is _not_ required to discard any extra precision.
>   For gcc on i386, this is what it would do without my rounding precision
>   hack, except possibly when compiled with -O0 and/or if the result of
>   x+y is spilled to memory (-O0 makes spilling quite likely).  The return
>   statement must be written as "return (double)(x+y);" to ensure removing
>   any extra precision _before_ returning.
>
>   Now I wonder if the implicit casts for function args with prototyped
>   functions are permitted to be as magic as the non-casts for return
>   statments.  Keeping extra precision in intermediate calculations is
>   good, and you don't really want to lose it unconditionally at function
>   call boundaries, and it's confusing to lose it on calls but not on
>   returns.

Sorry about the lack of trimming the above passage, but I was afraid
I might lose context if I deleted text.

Anyway, the above looks like a catch-22 for i386.  I can wrap
my implementation in fpsetprec pairs to get the desired precision,
but someone using libm may unknowingly lose precision.  The value
of LDBL_MANT_DIG in float.h actually caused my initial post because
I knew I lost at least 11 bits of precision (until I remembered
your workaround).  I thought that my polynomial approximations were
in error, so I tried many variations on the Chebyshev interpolation
method, Horner's method, summation with compensation, etc.

>   I'm using sin(1) mainly to generate a long double with nonzero in its
>   11 least significant mantissa bits.  Note that initializing a long
>   double to apparently-the-same value at compile time would not work:
>
>     long double x = 0xd.76aa47848677021p-4L;  /* loses extra precision */
>
>   This is because gcc on FreeBSD supports my precision hack to a fault,
>   and has bugs in the fault.  gcc now tries to do compile-time arithmetic
>   in the same precision as the runtime arithmetic, i.e., in only double
>   precision on FreeBSD.  This is wrong if there is no arithmetic involved,
>   as in the above initializer.  The runtime rounding precision only applies
>   (function) operations on long doubles with full precision.  The above
>   initializer corresponds to a store of a value that has somehow been
>   calculated in full precision.
>
>   Note that this bug makes it more difficult to implement functions in
>   long double precision -- you cannot use tables of long double constants.

Ah ha!  This explains why my Chebyshev polynomial approximations
(ie nearly mimimax approximations) can have between 10 and 100
ULP of errors.

>   OTOH, loads of long double constants are so slow on i386 that such
>   constants should almost never be used.  Instead represent long doubles
>   is faster than loading 1 long double, at least if the 2 extra operations
>   can be pipelined effectively.

Looks like I spend some time this week splitting the coefficients
into high and low parts.

Once again, thanks for a detailed explanation of some of
the finer points of the 53 vs 64 bit issue.  Like many of
you other responses, I'll save this one and hopefully
retain some of the info.

--
Steve
```