long double broken on i386?

Wed Dec 5 10:51:46 PST 2007

On Mon, Dec 03, 2007, Steve Kargl wrote:
> On Mon, Dec 03, 2007 at 02:44:08AM -0500, David Schultz wrote:
> > Is the latest version of the patch the one you sent to the list?
> > If cosl and friends produce accurate answers within a reasonable
> > domain, it's worth committing; the whole business about how
> > accurate cosl(1000000000) is can be dealt with later.
> 
> No, I've rewritten most of the code (numerous times in fact :-).
> 
> First, I followed Bruce's recommendation to compute the
> ULP in the higher precision available in GMP/MPFR instead 
> of the long double type.  The old method appears to 
> have been using a chop rounding because I would get
> only 0.5, 1.0, 1.5, ... ULPs (ie, I never saw anything like
> 0.6312 ULP).  With the new method, I see ULPs in the 0.6
> to 0.75 range.  I haven't had time to implement a full
> blown Remes algorithm to see if I can reduce the ULP. 
> 
> Second, I've implemented the algorithms found in 
> s_cos.c, s_sin.c, and s_tan.c  that use the extra
> precision from the argument reduction.  If I use
> these algorithms, then the ULPs go up!  For example,
> the ULP for sinl() goes up to about 1.5.  Bruce
> had suggested that the 396 hex digits in the table
> for 2/pi may need to be checked.  I found a paper
> by K.C. Ng at David Hough's website that (if I read it
> correctly) suggests that we need a much larger
> table.  So, I need to check how well argument reduction
> is working.

Having a version that works for machines with the 128-bit floating
point format is pretty important. (These would probably be two
separate files; I've been thinking we should add a msun/ieee96
directory and a msun/ieee128 directory or something.)

But honestly, I've tried to wrestle with the argument reduction
stuff before, and my advice is to not kill yourself over it.  You
need tons of extra precision and an entirely different computation
for huge numbers, and there are other things you could spend your
time on that would have a bigger impact. If someone tries to
compute cosl(10^20 * pi/3) using libm, for example, they're going
to get the wrong answer anyway. When 10^20 * pi/3 is expressed in
extended precision, the rounding error is more than pi, so it
doesn't matter how accurately you compute the cosine because the
input is totally out of phase. While it might be nice to say that
we have accurate argument reduction and blah blah blah, but it's
of little practical value.

That's not to say that we need no argument reduction at all. For
instance, cosl(5*pi/3) should still give an accurate answer. But
when the input is so huge that the exponent is bigger than the
number of significant bits in the mantissa, you lose so much
precision in the input that it's not as important anymore. That's
probably why Intel decided to make its trig instructions only work
up to 2^63 before requiring explicit argument reduction.