long double broken on i386?

Thu Nov 15 12:17:19 PST 2007

On Sat, Nov 03, 2007 at 03:53:19PM +1100, Bruce Evans wrote:
> This thread is old, so I quoted too much.

I quote very little, but it remains an old thread.  I'm
sure I have attributions wrong.

> % Index: msun/src/s_cosl.c
> % ...
> % +/*
> % + * Table of constants for 2/pi, 396 Hex digits (476 decimal) of 2/pi.
> % + * The table was taken from e_rem_pio2.c.
> % + */
> % +static const int32_t two_over_pi[] = {
> % +0xA2F983, 0x6E4E44, 0x1529FC, 0x2757D1, 0xF534DD, 0xC0DB62, 
> % +0x95993C, 0x439041, 0xFE5163, 0xABDEBB, 0xC561B7, 0x246E3A,
> 
> This gives far too many copies of this table (instead of just too many).
> The float and double versions avoid having a copy for each of cos/sin/tan
> by doing arg reduction in e_rem_pio2[f].c.  The extra layer for this file
> is a pessimization but seems to be worth it to avoid the duplication.
> 
> Are 396 hex digits enough?  It can't be right for all of the float, double
> and long double cases.  I hope it is enough for 128-bit long doubles, and
> the float case is just sloppy in using the same table as the double case.
> If 396 hex digits is enough for all cases, then the table should be in
> k_rem_pio2.c.

I started to rework my sinl(), cosl(), and tanl() (and __kernel_*)
routines.  In doing so, I've taken, for example, __kernel_cos() and
changed it to __kernel_cosl() keeping the "extra precision" algorithm.
For reference, see code below.  Testing shows that I have ULPs exceeding
38 for cosl() with this algorithm.  If the "extra precision" algorithm
is correct, then perhaps your question "Are 396 hex digits enough?" is
relevant. 

So, I poked around and found

http://docs.sun.com/app/docs/doc/801-7639/6i1ucu1uh?a=view

which contains the statement:

   libm and libsunmath trigonometric functions use an "infinitely" precise
   pi for argument reduction.  The value 2/pi is computed to 916 hexadecimal
   digits and stored in a lookup table to use during argument reduction.

I'm looking into this now.

-- 
Steve

static const long double xxx[10] = {
	-0x8.000000000000000p-4L, /* The - here, changes the sign of hz. */
	 0xa.aaaaaaaaaaaaaabp-8L,
	-0x5.b05b05b05b05b05p-12L,
	 0x1.a01a01a01a019dbp-16L,
	-0x4.9f93edde27cbed7p-24L,
	 0x8.f76c77fc4bbff37p-32L,
	-0xc.9cba54236b4e97fp-40L,
	 0xd.73f9aa92060766ap-48L,
	-0xb.4105ba69deeed1ap-56L,
	 0x7.7e10d7644cea922p-64L,
};

#define C	xxx

long double
__kernel_cosl(long double x, long double y)
{
	long double hz, r, z, w;

#if 1
	/*
	 * XXX - Although y supposely carries extra precision to allow
	 * better accuracy, the following lines are needed to get <= 1 ULP.
	 * Note, very simply testing with arg in [-2000:2000] shows ULPs
	 * that exceed 38 without this.
	 */
	x += y;
	y = 0;
#endif

	z = x * x;

	r = z * (C[1] + z * (C[2] + z * (C[3] + z * (C[4] + z * (C[5] +
	    z * (C[6] + z * (C[7] + z * (C[8] + z * C[9]))))))));

	hz = (float)C[0] * z;
	w = 1 + hz;

	return (w + (((1 - w) + hz) + (z * r - x * y)));

}