long double broken on i386?

Bruce Evans brde at optusnet.com.au
Wed Dec 5 21:15:53 PST 2007

On Wed, 5 Dec 2007, Steve Kargl wrote:

> On Wed, Dec 05, 2007 at 01:51:32PM -0500, David Schultz wrote:

>> Having a version that works for machines with the 128-bit floating
>> point format is pretty important. (These would probably be two
>> separate files; I've been thinking we should add a msun/ieee96
>> directory and a msun/ieee128 directory or something.)
>
> This is probably a good idea if freebsd wants to maintain the
> same algorithm across, say, sinf, sin, and sinl.  I can produce
> code for the ieee128 case, but I have no way to test it.

I guarantee that it will have bugs if it is not tested :-).

> An alternative may be to use table-driven implementations where
> the intervals are defined by exactly representable values in ieee96
> (i.e., 64-bit significand), which are then also exactly representable
> in ieee128.  I haven't investigated how many intervals one would
> need nor the best interpolation scheme.

My best implementation of logl does this to a large extent (see your
mailbox).  It uses an variable interval length of ~1/128 and rounds the
endpoints to float precision (endpoints are round(1 + i/128)).  Full
precision is still needed for the values at the endpoints, but the
full-precision values are represented as sums of lower-precision
values to optimize for amd64 and i386.

This won't work so well for sparc64.  The interval length of 1/128
gives a rounding error for the polynomial of about (1/128)^2 53-bit
ULPs = 2^-14 53-bit ULPs = 2^-3 64-bit ULPs, (due to the coeff of the
x^2 term possibly having a 1 53-bit ULP rounding error), and the Remez
approximation method reduces this to about 2^-11 64-bit ULPs, with
113-bit precision and 53-bit coeffs the interval length needs to be
more like sqrt(2^(53-113)) = 2^-30 which is too small to implement.
Hopefully Remez helps even more here.  The trig family (especially
cos) doesn't need such a small interval as log(), since its power
series converge faster and have every second term 0 and thus exactly
representable.

You don't want to do this initially for the trig family since it adds
complexity.  The Intel ia64 library uses intervals for almost everything,
including the trig family.  Splicing at the endpoints and relating the
endpoints to multiples of pi/2 makes it more complicated for the trig
family than for the (real) log family.

Bruce