kern/133583: [libm] fma(3) does not respect rounding mode using
extended precision
Alexander Best
arundel at freebsd.org
Tue Nov 23 23:50:10 UTC 2010
The following reply was made to PR kern/133583; it has been noted by GNATS.
From: Alexander Best <arundel at freebsd.org>
To: bug-followup at freebsd.org
Cc:
Subject: Re: kern/133583: [libm] fma(3) does not respect rounding mode using extended precision
Date: Tue, 23 Nov 2010 23:40:18 +0000
this is what bruce evans wrote concerning this issue.
cheers.
alex
----- Forwarded message from Bruce Evans <brde at optusnet.com.au> -----
Date: Tue, 16 Nov 2010 05:44:03 +1100 (EST)
From: Bruce Evans <brde at optusnet.com.au>
To: Alexander Best <arundel at FreeBSD.org>
cc: Bruce Evans <brde at optusnet.com.au>, Ulrich Spoerlein <uqs at FreeBSD.org>,
das at freebsfd.org
Subject: Re: svn commit: r215237 - head/lib/msun/src
[Cc trimmed]
On Mon, 15 Nov 2010, Alexander Best wrote:
>if you are interested in solving two more msun mysteries, you might want to
>have a look at #PR kern/133583 and standards/143358.
Here is a quick fix for #133583.
% Index: s_fma.c
% ===================================================================
% RCS file: /home/ncvs/src/lib/msun/src/s_fma.c,v
% retrieving revision 1.5
% diff -u -2 -r1.5 s_fma.c
% --- s_fma.c 3 Apr 2008 06:14:51 -0000 1.5
% +++ s_fma.c 15 Nov 2010 17:44:48 -0000
% @@ -170,5 +170,5 @@
% zs = ldexp(zs, -spread);
% r = c + zs;
% - s = r - c;
% + *(volatile *)&s = r - c;
% rr = (c - (r - s)) + (zs - s) + cc;
%
The basic problem is that s_fma.c uses Dekker's algorithm, which assumes
working floating point, i.e., not the floating point given by gcc on
i387. FreeBSD defaults to rounding to double precision so as to mostly
avoid this problem for doubles, and msun tries to and mostly succeeds
in avoiding it using direct methods for floats (using STRICT_ASSIGN()
and sometimes a more direct volatile cast or variable), but setting
extended precision as the test program does exposes all double precision
functions to it. This is 1 reason why the regression tests shouldn't
configure extended precision for double precision tests (unless they
want to test this, and we're not really ready for this since many other
double precision functions are known to be broken when extended precision
is configured).
The problem was obvious since it went away with -O0 and with -ffloat-store.
The above patch is the result of sprinkling volatiles around until I
found one that worked and then removing the ones that made no difference.
Perhaps many more are needed. In particular, the splitting into high
and low parts won't work as intended. The test in the PR somehow
worked without fixing this, probably just because it uses special args that
are already have enough low bits.
Note that STRICT_ASSIGN() doesn't work for doubles (is just null),
since it is intentionally optimized for the default FreeBSD rounding
precision, and using it (when it is not null) is a large pessimization.
The following functions in msun/src (and few or none elsewhere) use it:
% e_rem_pio2.c: STRICT_ASSIGN(double,fn,x*invpio2+0x1.8p52);
% e_rem_pio2f.c: STRICT_ASSIGN(double,fn,x*invpio2+0x1.8p52);
% k_rem_pio2.c: STRICT_ASSIGN(double,fw,fw);
% s_exp2.c: STRICT_ASSIGN(double, t, x + redux);
% s_exp2f.c: STRICT_ASSIGN(float, t, x + redux);
% s_log1p.c: STRICT_ASSIGN(double,u,1.0+x);
% s_log1pf.c: STRICT_ASSIGN(float,u,(float)1.0+x);
% s_rint.c: STRICT_ASSIGN(double,w,TWO52[sx]+x);
% s_rint.c: STRICT_ASSIGN(double,w,TWO52[sx]+x);
% s_rintf.c: STRICT_ASSIGN(float,w,TWO23[sx]+x);
% s_rintf.c: STRICT_ASSIGN(float,w,TWO23[sx]+x);
All the double-precision uses here are null and are just clones of the
float precision code where the use avoids a problem that occurs in the
usual case. Since the double precision uses are null, all the functions
with such uses are broken if someone configures extended precision.
This includes all trig functions (via *_rem_pio2.c). I have fixed
this locally only for the trig functions. In k_rem_pio2_.c, I essentially
just use a variant of STRICT_ASSIGN(), but a better way is to add and
subtract a bias (something like 0x1.8p53 (52 DBL_MANT_DIG - 1) to push
the unwanted bits to oblivion). I use such a bias to avoid the
STRICT_ASSIGN() in in s_exp2f.c:
% --- s_exp2f.c Fri Feb 22 14:46:43 2008
% +++ z22/s_exp2f.c Fri Feb 22 14:48:37 2008
% @@ -33,14 +33,50 @@
% #include "math_private.h"
%
% +/* To be moved nearer to <float.h>. */
% +#if FLT_EVAL_METHOD == 0
% +#define FLT_T_MANT_DIG FLT_MANT_DIG
% +#else
% +/*
% + * XXX this hack works for all supported arches. There is some doubt that
% + * float_t actually is double on arm and powerpc as claimed in <math.h>,
% + * but it is certainly not long double, and the reduction only needs
% + * float_t to be at least as large as the evaluation precision.
% + */
% +#define FLT_T_MANT_DIG DBL_MANT_DIG
% +#endif
% +
% #define TBLBITS 4
% #define TBLSIZE (1 << TBLBITS)
%
% +/*
% + * Ensure that redux fits in a float. float_t would always work, but
% + * float works on all supported arches and is more efficient on some.
% + */
% +#if FLT_T_MANT_DIG >= FLT_MAX_EXP + TBLBITS
% +#error "Unsupported type for float_t"
% +#endif
% +
% static const float
% huge = 0x1p100f,
% - redux = 0x1.8p23f / TBLSIZE,
% + redux = __CONCAT(0x1.8p, FLT_T_MANT_DIG) / 2 / TBLSIZE,
% +/*
% + * Domain [-0.03125, 0.03125], range ~[-6.015e-11, 5.960e-11]
% + * |exp2(x) - p(x)| < 2**-33.95
% + */
% P1 = 0x1.62e430p-1f,
% P2 = 0x1.ebfbe0p-3f,
% P3 = 0x1.c6b348p-5f,
% P4 = 0x1.3b2c9cp-7f;
% +#if 0
% +/*
% + * Domain [-0.03125, 0.03125], range ~[-2.4881e-12, 2.4881e-12]:
% + * |exp2(x) - p(x)| < 2**-38.54
% + */
% +static const double
% + P1 = 6.9314718016256749e-1, /* 0x162e42fec39c72.0p-53 */
% + P2 = 2.4022650689211292e-1, /* 0x1ebfbdff5df1dd.0p-55 */
% + P3 = 5.5505736316289703e-2, /* 0x1c6b3f74700eea.0p-57 */
% + P4 = 9.6183434022792860e-3; /* 0x13b2c832d5fd3b.0p-59 */
% +#endif
%
% static volatile float twom100 = 0x1p-100f;
% @@ -69,4 +105,8 @@
% *
% * Accuracy: Peak error < 0.501 ulp; location of peak: -0.030110927.
% +#if 0
% + * Accuracy: Peak error < 0.50004 ulp; location of peak: -0.00918653142
% + * 5168 cases out of 2**32 are incorrectly rounded in round-to-nearest
mode.
% +#endif
% *
% * Method: (equally-spaced tables)
% @@ -95,5 +135,5 @@
% {
% double tv, twopk, u, z;
% - float t;
% + float_t t;
% uint32_t hx, ix, i0;
% int32_t k;
% @@ -118,12 +158,18 @@
%
% /* Reduce x, computing z, i0, and k. */
% - STRICT_ASSIGN(float, t, x + redux);
% + t = x + redux;
% +#if FLT_T_MANT_DIG == 24
% GET_FLOAT_WORD(i0, t);
% +#elif FLT_T_MANT_DIG == 53
% + GET_LOW_WORD(i0, t);
% +#else
% +#error "Unsupported type for float_t"
% +#endif
% ...
This also has complications to use float_t. float_t will normally be
double or long double, and the bias (redux here) depends on the number
of bits in the type used in the redux. The basic idea is to use the
natural type for operations (float_t, not float here), so that adding
and subtracting biases just works and the correct bias can be known
at compile time. This is painful to configure (see above), is broken
by anywone configuring extra precision (e.g., the 53 in the above
depends on the precision remaining at double).
My uncommitted logl() uses a modified subset of Dekker's algorithm without
hitting the problems with extra precision or needing to use STRICT_ASSIGN()
to avoid them (any extra precision just gives extra precision in the result).
I think the additive part doesn't really care about extra precision. It
The multicative part cares since it needs the hi*hi multiplications to be
exact, so the hi parts must not have extra bits.
Bruce
----- End forwarded message -----
--
a13x
More information about the freebsd-bugs
mailing list