misc/84920: math programs reporting incorrect values

Tue Aug 16 11:51:44 GMT 2005

On Sun, 14 Aug 2005, Colin King wrote:

>> Description:
> 	Both of these program are giving me incorrect results when subtracting
> 	floating-point numbers, so I'm assuming that it is either something
> 	wrong with libm, libc, or gcc.

Both of which programs?

>> How-To-Repeat:
> 	Go to either one of these programs and type an expression that uses
> 	at least one floating point number and a subtraction. For example, I
> 	used 30.00-29.05 as my expression. In e, the result is 0.949999.... In
> 	KDE's kcalc, the result is 0.9500000000000001776356839400250464677811,
> 	or 0.9500000000000002 after rounding.

This seems to more or less correct if the programs use plain double
precision floating point (*).  1/5 is not exactly representable in
base 2 floating point, so 29.05 is not exactly representable.  Its
best approximation in IEEE double precision is 29.050000000000000711...,
and subtracting this from 30 is likely to increase the relative error;
the result is 0.94999999999999928946...

The bug may be that the programs use long double precision and LDBL_DIG
is broken on i386's.  Long double precision is actually configured to be
almost the same as double precision on i386's to avoid other precision
bugs.  Thus programs that naively trust LDBL_DIG might show garbage digits
that should be avoided by rounding.  DBL_DIG is 15, and the value of the
approximation to 29.05 shows that it is necessary to round to DBL_DIG-1
digits to eliminate garbage digits even when there are no roundoff errors:

     29.05 is represented as                 29.050000000000000711...,
     rounding this to DBL_DIG   digits gives 29.050000000000001,
     rounding this to DBL_DIG-1 digits gives 29.05000000000000,

Similarly for the difference:

     (30.00 - 29.05) is represented as       0.94999999999999928946...
     rounding this to DBL_DIG   digits gives 0.949999999999999
     rounding this to DBL_DIG-1 digits gives 0.95000000000000

Your "0.9500000000000002 after rounding" has DBL_DIG+1 digits.

I generated the above approximations using gcc and checked them using gp.
gcc calculated 29.05 at compile time and the difference at run time.
gcc was relatively recently "fixed" so that evaluation of constant
expressions at compile time matches the corresponding evaluation at
runtime even when you don't want it to (e.g., "long double x = 29.05LL;"
gives only double precision accuracy).  OTOH, calculation programs use
by nature runtime evaluation for most constants.

gp uses multiple (arbitrary) precision with a default of 28 digits and
I used my library functions for it to convert to double precision.

The above may explain the behaviour of "e" (?).  The behaviour of kcalc
seems to be unrelated.  The magic number
0.9500000000000001776356839400250464677811 for kcalc is precisely
30.00 - 29.05 in 55-bit or 56-bit precision.  This can't be related
to libm, libc or gcc.  It might be from calculating things correctly
using multiple (56-bit) precision and then printing the values using
a silly number of digits.  Well, it might be related to libc -- if you
have a floating point number with 56 bits of precision, then the number
can be represented as a long double with 64 bits of precision, and libc
is now supposed to be able to print such numbers perfectly with any
number of digits.  DBL_DIG+1 is about right for 56 bits of precision,
so perhaps the only bug in kcalc here is that it rounds
0.9500000000000001776... to 1 more digit than it should
(DBL_DIG+1) instead of (DBL_DIG+1 - 1).

(*) Calculating programs should not use C floating point with any precision
except in special cases; they should use multiple precision and keep track
of precisons so as not to print garbage digits...

Bruce