Implementation of half-cycle trignometric functions
Steve Kargl
sgk at troutmask.apl.washington.edu
Thu May 18 17:54:37 UTC 2017
On Thu, May 18, 2017 at 04:34:57PM +1000, Bruce Evans wrote:
> On Wed, 17 May 2017, Steve Kargl wrote:
>
> > On Thu, May 18, 2017 at 09:25:58AM +1000, Bruce Evans wrote:
> >> On Wed, 17 May 2017, Steve Kargl wrote:
> > ...
> >>> As such, I've added a comment
> >>>
> >>> /*
> >>> * pi_hi contains the leading 56 bits of a 169 bit approximation for pi.
> >>> */
> >>
> >> Why 56? 53 + 56 = 109.
> >
> > This is ld128.
>
> ld128 has precision 113.
Yes. I know.
> > static const long double
> > pi_hi = 3.14159265358979322702026593105983920e+00L,
> > pi_lo = 1.14423774522196636802434264184180742e-17L;
> >
> > pi_hi has 113/2 = 56 bits.
> > pi_lo has 113 bit.
> >
> > 56 + 113 = 169
>
> But hi is set intentionally sloppily by converting to double, so it has
> 53 bits instead of the 57 that 56 would go with (57 = 113/2 rounded up).
A 53-bit entity times a 56-bit entity needs 109-bits for the result.
A 113-bit entity can hold a 109-bit entity. Or am I missing something?
> >> My improvements to lgamma()'s sin_pi*() are buggy. I forgot the original
> >> target of avoiding inexact better:
> >
> > I'll need to wait until the weekend to study your improved sin_pi.
>
> I now have closer to production version. sinpi() runs more than 2
> times faster that your version, without doing much different except
> not having so much EXTRACT/INSERT or special cases. (66 cycles instead
> of 150 on Haswell-i386, and the 66 can be reduced to 58 by removing
> the STRICT_ASSIGN() which breaks only i386 with non-default extra
> precision. The difference would be smaller on amd64 or better compilers
> , but Haswell-i386 has much the same slownesses as old Athlon from the
> EXTRACTs and INSERTs.
>
> However, both versions fail tests. Min has a max error of 2048 ulps
> and an everage error of 928 ulps and yours has a maximum error of many
> giga-ups for the sign bit wrong and an average error of 561 ulps. The
> enormous errors are probably for denormals or NaNs.
You're correct that I did not give much consideration to subnormal
number. For float this, can be fixed by
if (ix < 0x38800000) { /* |x| < 0x1p-14 */
if (x == 0)
return (x);
SET_FLOAT_WORD(hi, hx & 0xffff0000);
if (ix <= 0x0b7f0001) { /* Avoid spurios underflow. */
hi *= 0x1p23F;
lo = x * 0x1p23F - hi;
s = (pi_lo + pi_hi) * lo + pi_lo * hi +
pi_hi * hi;
return (s * 0x1p-23F);
}
lo = x - hi;
s = (pi_lo + pi_hi) * lo + pi_lo * hi +
pi_hi * hi;
return (s);
}
or if you want to avoid the extra if-statement. Simply scale
if (ix < 0x38800000) { /* |x| < 0x1p-14 */
if (x == 0)
return (x);
SET_FLOAT_WORD(hi, hx & 0xffff0000);
hi *= 0x1p23F;
lo = x * 0x1p23F - hi;
s = (pi_lo + pi_hi) * lo + pi_lo * hi +
pi_hi * hi;
return (s * 0x1p-23F);
}
Starting with x = 0x1p-14 and using nextafterf(x,-1.f), the
first value where underflow is signalled is
% ./testf -S -D
sinpif(3.74171353e-39) = 1.17549407e-38
x is subnormal
sinpif(x) is subnormal
Exhaustive testing of numbers in the domain [FLT_MIN, 0x1p-14] yields
% ./testf -S -m 1.17549435E-38F -M 0x1p-14F -ed
MAX ULP: 0.58366567
Total tested: 939524096
--
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow
More information about the freebsd-numerics
mailing list