long double broken on i386?

Wed Dec 5 19:35:40 PST 2007

On Wed, 5 Dec 2007, David Schultz wrote:

> But honestly, I've tried to wrestle with the argument reduction
> stuff before, and my advice is to not kill yourself over it.  You
> need tons of extra precision and an entirely different computation
> for huge numbers, and there are other things you could spend your
> time on that would have a bigger impact.

The work for this is mostly already done.  The new work is just to
ensure correctness of data fed into and read out of a hopefully
debugged algorithm.  That must be done anyway.

> If someone tries to
> compute cosl(10^20 * pi/3) using libm, for example, they're going
> to get the wrong answer anyway. When 10^20 * pi/3 is expressed in

Why would anyone do that? :-)

> extended precision, the rounding error is more than pi, so it
> doesn't matter how accurately you compute the cosine because the
> input is totally out of phase. While it might be nice to say that
> we have accurate argument reduction and blah blah blah, but it's
> of little practical value.

The Intel ia64 library and IIRC, LIA, has functions which take an arg
in degrees (cosd(), etc?) so that arg reduction is simpler and even
huge integer args (in degrees) can be handled perfectly.  It's strange
that degrees can work better than radians.  With only functions that
take args in radians, the loss of precison goes the other way, with
naive user code doing cosd(x) := cos(x * pi / 180) and losing a lot
of accuracy even for x = 360.

> That's not to say that we need no argument reduction at all. For
> instance, cosl(5*pi/3) should still give an accurate answer. But
> when the input is so huge that the exponent is bigger than the
> number of significant bits in the mantissa, you lose so much
> precision in the input that it's not as important anymore. That's
> probably why Intel decided to make its trig instructions only work
> up to 2^63 before requiring explicit argument reduction.

They don't actually work up to 2^63.  They work up to about 2^2 or 2^4
for extended precision and up to about 2^13 or 2^15 for double precision,
since their internal approximation to pi only has 66 or 68 bits
(ucbtest says 66(->68) bits; fdlibm has 3168 bits).

When Intel decided this for i87's, they only had a budget of 25000 (?)
gates and a few hundred million dollars (?).  Their budget is larger
now :-), but for ia64 they do it all in software (like fdlibm does I
think, but much more MD and optimized, especially for arg reduction).
Thus the old bugs of a bad internal approximation for pi and a broken
indicator for when better arg reduction is needed are fixed on ia64.

Bruce