# Implementation of half-cycle trignometric functions

Sat Apr 29 00:59:26 UTC 2017

```On Fri, Apr 28, 2017 at 04:35:52PM -0700, Steve Kargl wrote:
>
> I was just backtracking with __kernel_sinpi.  This gets a max ULP < 0.61.
>
> static const float
> s0hi =  3.14062500e+00f,	/* 0x40490000 */
> s0lo =  9.67653585e-04f,	/* 0x3a7daa22 */
> s1   = -5.16771269e+00f,	/* 0xc0a55de7 */
> s2   =  2.55016255e+00f,	/* 0x402335dd */
> s3   = -5.99202096e-01f,	/* 0xbf19654f */
> s4   =  8.10018554e-02f;	/* 0x3da5e44d */
>
> static inline float
> __kernel_sinpif(float x)
> {
> 	double s;
> 	float p, x2;
> 	x2 = x * x;
> 	p = x2 * (x2 * (x2 * s4 + s3) + s2) + s1;
> 	s = x * (x2 * (double)p + s0hi + s0lo);
> 	return ((float)s);
> }
>

Well, starting with above and playing the splitting game
with x, x2, and p.  I've manage to reduce the max ULP in
the region testd.  Testing against MPFR with sin(pi*x)
computed in 5*24-bit precision gives

MAX ULP: 0.73345101
Total tested: 33554427
0.7 < ULP <= 0.8: 90
0.6 < ULP <= 0.7: 23948

Exhaustive testing with my older sinpi(x) as the reference
gives

./testf -S -m 0x1p-14 -M 0.25 -d -e
MAX ULP: 0.73345101
Total tested: 100663296
0.7 < ULP <= 0.8: 45
0.6 < ULP <= 0.7: 11977

The code is slightly slower than my current best kernel.
sinpif time is 0.0147 usec per call (60 cycles).

static inline float
__kernel_sinpif(float x)
{
float p, phi, x2, x2hi, x2lo, xhi, xlo;
uint32_t ix;

x2 = x * x;
p = x2 * (x2 * (x2 * s4 + s3) + s2) + s1;

GET_FLOAT_WORD(ix, p);
SET_FLOAT_WORD(phi, (ix >> 14) << 14);

GET_FLOAT_WORD(ix, x2);
SET_FLOAT_WORD(x2hi, (ix >> 14) << 14);

x2lo = s0lo + x2 * (p - phi) + (x2 - x2hi) * phi;
x2hi *= phi;

GET_FLOAT_WORD(ix, x);
SET_FLOAT_WORD(xhi, (ix >> 14) << 14);
xlo = x - xhi;
xlo = xlo * (x2lo + x2hi) + (xlo * s0hi + xhi * x2lo);

return (xlo + xhi * x2hi + xhi * s0hi);
}

--
Steve