Bit twiddling question

Bruce Evans brde at optusnet.com.au
Thu Mar 9 18:16:03 UTC 2017


On Thu, 9 Mar 2017, Steve Kargl wrote:

> On Wed, Mar 08, 2017 at 11:52:36PM -0800, Steve Kargl wrote:
>> To give a hint at what I have been working on, I have implementations
>> for sinpif(x), sinpi(x), cospif(x), and cospi(x).  For sinpif(x) and
>> cospif(x) I have kernels that give correctly rounded results for
>> FLT_MIN <= x < 0x1p23 (at least on x86_64).   I also have slightly

I didn't see the original mail.

I'd prefer you to finish the long double versions of cosl(), sinl() and
tanl() (maybe das is responsible for the latter).  These are still too
slow and minor bugs for at least sinl() on small x, partly by not copying
the lower-precision versions enough.

>> faster kernels that give max ULPs of about 0.5008.  For sinpi(x) and

Similar to for sinf() and cosf().  The maximum for those is 0.5009, using
double precision calculations which I now know to be a mistake, and
a polynomial of degree higher than necessary for good enough rounding in
float precision and lower than necessary for perfect rounding in float
precision.

>> cospi(x), my kernels currently give about 1.25 ULP for the worse case
>> error if I accumulate the polynomials in double precision.  If I

Not too good.  I think the problem is that the coeffs for the lowest
terms are no longer exactly representable.  The linear function f(x) = a*x
is harder to evaluate accurately than exp(x) = 1 + x + ..., unless a
is exactly representable and also  has few digits (like 1 or -0.5 for
early coeffs of sin() and cos()).

>> accumulate the polynomials in long double precision, then I get
>> correctly rounded results.  To complete the set, I was hoping to

Don't do that.  It is much slower, and not available starting with
long double precision, or even starting with double precision on arches
without real long doubles.  It is also a mistake to use it for float
precision like I did for trig functions and an uncommitted version of
logf().  I managed to make the specialized logf() run 50% faster than
the less specialized log() (also uncommitted), by using double precision
and manually schedling for Athlon-XP, but modern CPUs and compilers handle
the less sprecialized log() so well that the specialized logf() is only
a couple of cycles faster.  Similarly but less pronounced for trig functions.

>> work out ld80 and ld128 versions.  `ld128 is going to be problematic
>> due to the absense of int128_t.
>>
>> I'll send you what I have in a few days.
>
> It is sometime amazing what happens when I sleep.  For sinpi[fl](x)
> and the method I came up with one needs to do argument reduction
> where x = n + r.  Here n is an integer and 0 < r < 1.  If n is even
> then one is in the positive half-cyle of sin(x) and if n is odd then
> one has the negative half-cycle.  For sinpif(x) this looks like
>        if (ix < 0x4b000000) {          /* 1 <= |x| < 0x1p23 */
> /* 1 */         ix = (uint32_t)ax;
> /* 2 */         ax = ax == ix ? zero : __compute_sinpif(ax - ix);
> /* 3 */         if (ix & 1) ax = -ax;
> /* 4 */         return ((hx & 0x80000000) ? -ax : ax);
>        }
>
> Line 1 determines n.  Line 2 computes either sinpi(n) exactly or
> sinpi(r).  Line 3 uses n odd to set the sign for the half-cycle.
> Line 4 is used to set the sign from sin(-x) = -sin(x).
>
> For double and ld80 one can use int64_t instead of uint32_t.  Last
> night I realized that I don't need to use int64_t, and more importantly
> I don't need int128_t.  The argument reduction can use uint32_t
> everywhere.  For double, I currently have
>
>        if (ix < 0x43300000) {          /* 1 <= |x| < 0x1p52 */
>                int64_t n;
>                n = (int64_t)ax;
>                ax = ax == n ? zero : __compute_sinpi(ax - n);
>                if (n & 1) ax = -ax;
>                return ((hx & 0x80000000) ? -ax : ax);
>        }
>
> which can be written (off the top-of-head and not tested)
>        if (ix < 0x43300000) {          /* 1 <= |x| < 0x1p52 */
>                double volatile vx;
>                uint32_t n;
>                vx = ax + 0x1p52;
>                vx = vx - 0x1p52;
>                if (vx == ax) return(0);
>                if (vx > UINT32_MAX) {   /* ax = m + n + r with m + n */
>                    vx -= UINT32_MAX;    /* m = UINT32_MAX.  n is in range */
>                    ax -= UINT32_MAX;
> 				}
>                n = (uint32_t)vx;
>                ax = __compute_sinpi(ax - n);
>                if (n & 1) ax = -ax;
>                return ((hx & 0x80000000) ? -ax : ax);
>        }
>
> Something similar can be applied to ld128, but reduction may take
> two rounds (ie., a comparison with UINT64_MAX and then UINT32_MAX).

I don't like that much.  The 0x1p52 magic is tricky and probably buggy.
s_rint.c uses the same magic but needs many complications to avoid
double rounding.  Similar-looking magic for trig reduction in radians
uses floating the magic value of 0x1.8p52 to bias rounding errors in
a safe way, and arranges so that rounding errors don't matter then.
Perhaps you can do the same here -- do a fixup of +-UINT32_MAX if
necessary.  This seem to be no harder than using rint(), round() or
floor().  Except for floor(), we it is hard to control whether the
rounded result is too high or too low.  Using floor:

 	prec_t n = floor(x);
 	r = x - n;

but we want to classify quadrants, and still don't have an n that will
fit in an integer, so first scale by 2 (4 quadrants of size 1/2):

 	y = x * 0.5 - floor(x * 0.5);	/* 4 quads in [2n, 2n+2) -> [0, 1) */
 	y = 4 * y;			/* -> [0, 4) */
 	int n = y;			/* quadrant */
 	r = (y - n) / 4;		/* offset in quadrant */

This is a bit slow, and too much like lgamma's method.  frexp() can be used
instead of floor() to slowly extract the fraction.  I think it can be done
using the 0x1p52 magic a bit like you say.  s_rint.c already uses this
magic, but has to work hard to handle errors in double rounding.  s_floor.c
uses it via bit fiddling.

If we replace the slow floor(z) by the 0x1p52 calculation on z, then we
get an approximation to floor(z) that is hopefully an integer (possibly
an integer + 0.5) within 1 or 2 of the correct result.  Adjust it
until it is correct.

Bruce


More information about the freebsd-numerics mailing list