Implementation of half-cycle trignometric functions
Steve Kargl
sgk at troutmask.apl.washington.edu
Sat Apr 29 00:59:26 UTC 2017
On Fri, Apr 28, 2017 at 04:35:52PM -0700, Steve Kargl wrote:
>
> I was just backtracking with __kernel_sinpi. This gets a max ULP < 0.61.
>
> static const float
> s0hi = 3.14062500e+00f, /* 0x40490000 */
> s0lo = 9.67653585e-04f, /* 0x3a7daa22 */
> s1 = -5.16771269e+00f, /* 0xc0a55de7 */
> s2 = 2.55016255e+00f, /* 0x402335dd */
> s3 = -5.99202096e-01f, /* 0xbf19654f */
> s4 = 8.10018554e-02f; /* 0x3da5e44d */
>
> static inline float
> __kernel_sinpif(float x)
> {
> double s;
> float p, x2;
> x2 = x * x;
> p = x2 * (x2 * (x2 * s4 + s3) + s2) + s1;
> s = x * (x2 * (double)p + s0hi + s0lo);
> return ((float)s);
> }
>
Well, starting with above and playing the splitting game
with x, x2, and p. I've manage to reduce the max ULP in
the region testd. Testing against MPFR with sin(pi*x)
computed in 5*24-bit precision gives
MAX ULP: 0.73345101
Total tested: 33554427
0.7 < ULP <= 0.8: 90
0.6 < ULP <= 0.7: 23948
Exhaustive testing with my older sinpi(x) as the reference
gives
./testf -S -m 0x1p-14 -M 0.25 -d -e
MAX ULP: 0.73345101
Total tested: 100663296
0.7 < ULP <= 0.8: 45
0.6 < ULP <= 0.7: 11977
The code is slightly slower than my current best kernel.
sinpif time is 0.0147 usec per call (60 cycles).
static inline float
__kernel_sinpif(float x)
{
float p, phi, x2, x2hi, x2lo, xhi, xlo;
uint32_t ix;
x2 = x * x;
p = x2 * (x2 * (x2 * s4 + s3) + s2) + s1;
GET_FLOAT_WORD(ix, p);
SET_FLOAT_WORD(phi, (ix >> 14) << 14);
GET_FLOAT_WORD(ix, x2);
SET_FLOAT_WORD(x2hi, (ix >> 14) << 14);
x2lo = s0lo + x2 * (p - phi) + (x2 - x2hi) * phi;
x2hi *= phi;
GET_FLOAT_WORD(ix, x);
SET_FLOAT_WORD(xhi, (ix >> 14) << 14);
xlo = x - xhi;
xlo = xlo * (x2lo + x2hi) + (xlo * s0hi + xhi * x2lo);
return (xlo + xhi * x2hi + xhi * s0hi);
}
--
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow
More information about the freebsd-hackers
mailing list