cvs commit: src/sys/i386/include _types.h

Bruce Evans brde at
Thu Mar 6 23:05:16 UTC 2008

Eek, my mailbox filled up overnight.  Trying to reply to only some details
in LIFO order...

On Wed, 5 Mar 2008, Colin Percival wrote:

> David Schultz wrote:
>> On Wed, Mar 05, 2008, Colin Percival wrote:
>>> Setting the i387 FPU to 53-bit precision gives standards-compliant
>>> behaviour whether people are using "double" or "long double".
>> Not quite.
> How is it not standards-compliant?
>>> Yes and no.  Double rounding isn't allowed;
>> Yes, it is.
> No, it isn't.  The following code

Yes it is :-).  Extra precision for intermediate values implies double
rounding for the final results, and extra precision for intermediate
values is a fundamental part of C (it used to be only that float
expressions were (de-facto standard required to be) evaluated in double
precision, but now it affects double expressions too, and C99 has vast
complications to support i387'-ish extra precision.

> 	double x, y, z;
> 	x = 1.0;
> 	y = 0x1.0p-53 + 0x1.0p-75;
> 	z = x + y;
> 	z = z - 1.0;
> 	if (z == 0.0)
> 		printf("BUGGY MATH\n");
> 	else
> 		printf("IEEE754-COMPLIANT MATH\n");
> should never output "BUGGY MATH", since x + y should round up to
> 1 + 2^(-52); but if you set the i387 to use extended precision, it
> gets the wrong answer.

In C, with FLT_EVAL_METHOD = 2 (evaluate all FP expressions in the range
and precision of the long double type) and long doubles actually longer
than doubles, this always gives double rounding:

 	x + y       is evaluated (and presumably rounded) in long double
 	z = x + y   rounds to double precision
 		    (but with gcc-i386, it doesn't round)
                     (-O0 happens to work to fix this)
                     (the -ffloat-store hack happens to work to fix this)
 	z           is now 1.0 except with gcc-i386 -O -fno-float-store
 		    when it is 1.0L + 0x1.0p-53L.
         z - 1.0     is evaluated in long double precision
 	z = z - 1.0 rounds to double precision (except with gcc-i386...)
 	z           is now 0.0,
 		    except with gcc-i386 -O -fno-float-store it is 0x1.0p-53L
 	if (z == 0.0)
 		    fails to detect the bug for gcc-i386 -O -fno-float-store
 		    since z is still long double and nonzero

Another advantage of using double_t is that you can control when the
double rounding occurs and not have to understand so many sub-cases
from the compiler bugs.  The result is then always 0x1.0p-53L (if
double_t is long double, etc.), and if you want a double result then
you can round as the last step.  The result is then 0x1.0p-53.  You
don't get the double precision result of 0x1.0p-52, but you get a more
accurate result.  Extra precision is of course intended to deliver a
more accurate result in this way.

This assumes 64-bit precision.  With 113-bit precision for long doubles,
all intermediate values and the result would be exact; the result would
be y.  It seems to be impossible to choose y to demonstrate the problem
for 113-bit long doubles.  We see a similar phenomenon for floats vs
doubles -- since the extra precision provided by doubles is actually
(more than) double, double rounding never (?) occurs.  Extended
precision should have had more than 64 bits.  I think it is mainly the
result of Intel's transistor budget being limited 30 years ago.

>>>> The downside is that this breaks long doubles.
>>> Except that it doesn't.  Using 53-bit precision for long doubles is
>>> a perfectly valid and standards-compliant option.
>> It's impossible to implement efficient library functions that
>> produce correct long double results
> You could have stopped this sentence here -- for all practical purposes,
> correctly rounded trigonometric functions are not feasible.

Nah, it's quite feasible, especially for transcendental functions since
transcendental numbers in general can be approximated more closely by
rationals than algebraic numbers.  The evaluation just needs to be
done in extra precision, and at least for trig functions, "in general"
happens in practice so relatively few cases require significant extra
precision.  With extra precision in hardware, all except the relatively
few cases can even be implemented efficiently.

>> when the FPU is set to 64-bit
>> mode and avoid double rounding and related problems when the FPU
>> is set to 53-bit mode.
> Fortunately, library functions aren't required to have any particular
> error bounds.  It's always nice if they are correct to within 0.6ulp
> instead of 0.8ulp or suchlike -- but nobody should be proving code
> correctness based on assumed properties of library transcendental
> functions.  People should, and do, prove code correct based on the
> assumption that double precision arithmetic behaves like double precision
> arithmetic, however (myself included).

We require error bounds of < 1 ulp and aim for < 0.5[0]1 ulps.  Error
bounds are required for everything.  E.g., libm uses expm1() to implement
hyperbolic functions and needs good error bounds for expm1() and a
nontrivial error analysis to justify using it (since the error can blow
up and would blow up if exp(x)-1 were used instead of expm1(x)), yet it
still has an error of > 1 ulp since expm1() and other things aren't
accurate enough.


More information about the cvs-src mailing list