Implementation of half-cycle trignometric functions

Fri Apr 28 08:55:18 UTC 2017

On Thu, 27 Apr 2017, Steve Kargl wrote:

> On Thu, Apr 27, 2017 at 04:14:11PM -0700, Steve Kargl wrote:
>>>
>>> The code is attached the bug reportr.
>>>
>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=218514
>>
>> I have attached a new diff to the bugzilla report.  The
>> diff is 3090 lines and won't be broadcast the mailing list.
> ...
> Grrrr.  Find a sloppy theshold can be fun.
>
> Index: src/s_cospif.c
> ===================================================================
> --- src/s_cospif.c	(revision 1916)
> +++ src/s_cospif.c	(working copy)
> @@ -61,7 +61,7 @@
> 	SET_FLOAT_WORD(ax, ix);
>
> 	if (ix < 0x3f800000) {			/* |x| < 1 */
> -		if (ix < 0x39000000) {		/* |x| < 0x1p-13 */
> +		if (ix < 0x38800000) {		/* |x| < 0x1p-14 */
> 			if (huge + ax > 0)	/* Raise inexact iff != 0. */
> 				return (1);
> 		}

This looks too different from s_cosf.c:

XX 	if(ix <= 0x3f490fda) {		/* |x| ~<= pi/4 */
XX 	    if(ix<0x39800000)		/* |x| < 2**-12 */
XX 		if(((int)x)==0) return 1.0;	/* 1 with inexact if x != 0 */
XX 	    return __kernel_cosdf(x);
XX 	}

The (int)x == 0 method is perhaps not best (especially not the excessive
parentheses used in it), but it was verified to be faster than other
methods.  This might be a matter of tricking the compiler into not
optimizing the unusual case.  (__predict ugliness doesn't work any better.)

Non-formatting style bugs include:
- arguably worse spelling of powers of 2 in comments
- less information in the comment about inexact.  I try to change all
   similar comments to be consistent whenever I change their code.
   s_cos.c still has a variant with less detail "/* generate inexact".

s_sinl.c is still broken by your not copying the fdlibm code:

XX 	/* If x = +-0 or x is a subnormal number, then sin(x) = x */
XX 	if (z.bits.exp == 0)
XX 		return (x);

This is missing not only the comment about setting inexact, but also
the code.  This is also poorly optimized.

XX 	/* Optimize the case where x is already within range. */
XX 	if (z.e < M_PI_4) {
XX 		hi = __kernel_sinl(z.e, 0, 0);
XX 		RETURNI(s ? -hi : hi);
XX 	}

This is is considerably more broken.  It underflows for small x.  Evaluation
of polynomials always depends on not doing it for tiny x, since evaluation
of higher terms always underflows for tiny x even when x is not denormal
(except when the poly is linear, the T1*x term doesn't underflow if either
T1 is 0 or T1 is larger than about 1 and x is not denormal).

Filtering out tiny x is just an optimization, but prevents this underflow.
It sometimes also gives free tests for 0 and denormals (I don't see how
to fold this test into the others), and free optimizations for tiny x.  It
does do this for sin(x), provided we don't support non-standard rounding
modes.  The correct value is just x with inexact if x != 0.

s_cosl.c has the same bug.  I sent you preliminary patches in 2008 when
the long double trig functions were committed, but left fixing these bugs
as an exercise.  Actually, I thought I fixed this in cosl() and only left
sinl() as an exercise.

XX diff -u2 s_cosl.c~ s_cosl.c
XX --- s_cosl.c~	Thu May 30 18:17:21 2013
XX +++ s_cosl.c	Tue Sep  6 03:43:57 2016
XX @@ -55,21 +55,17 @@
XX  	long double y[2];
XX  	long double hi, lo;
XX +	int ex;
XX 
XX  	z.e = x;
XX -	z.bits.sign = 0;
XX -
XX -	/* If x = +-0 or x is a subnormal number, then cos(x) = 1 */
XX -	if (z.bits.exp == 0)
XX -		return (1.0);
XX -
XX -	/* If x = NaN or Inf, then cos(x) = NaN. */
XX -	if (z.bits.exp == 32767)
XX -		return ((x - x) / (x - x));
XX +	ex = z.bits.exp;

Mostly style and minor efficiency fixes.

XX 
XX  	ENTERI();
XX 
XX -	/* Optimize the case where x is already within range. */
XX -	if (z.e < M_PI_4)
XX -		RETURNI(__kernel_cosl(z.e, 0));
XX +	if (ex < 0x3ffe || (ex == 0x3ffe && z.bits.manh < 0xc90fdaa2))
XX +		RETURNI(__kernel_cosl(x, 0));

Efficiency fix for the comparison.

But no fix for underflow, etc.

XX +
XX +	/* If x = NaN or Inf, then cos(x) = NaN. */
XX +	if (ex == 32767)
XX +		RETURNI(x - x);
XX 
XX  	e0 = __ieee754_rem_pio2l(x, y);

ISTR breaking s_sinf.c and/or s_cosf.c with the invalid optimization of
just evaluating the polynomial, to avoid the branch for the
micro-optimization.  But it is not juste for an optimization.

Bruce