Implementation of half-cycle trignometric functions
Bruce Evans
brde at optusnet.com.au
Sat Apr 29 19:09:39 UTC 2017
On Sat, 29 Apr 2017, Steve Kargl wrote:
> On Sat, Apr 29, 2017 at 05:54:21PM +1000, Bruce Evans wrote:
>> On Fri, 28 Apr 2017, Steve Kargl wrote:
> ...
>>> GET_FLOAT_WORD(ix, p);
>>> SET_FLOAT_WORD(phi, (ix >> 14) << 14);
>>>
>>> GET_FLOAT_WORD(ix, x2);
>>> SET_FLOAT_WORD(x2hi, (ix >> 14) << 14);
>>
>> I expect that these GET/SET's are the slowest part. They are quite fast
>> in float prec, but in double prec on old i386 CPUs compilers generate bad
>> code which can have penalties of 20 cycles per GET/SET.
>>
>> Why the strange reduction? The double shift is just a manual optimization
>> or pssimization (usually the latter) for clearing low bits. Here it is
>> used to clear 14 low bits instead of the usual 12. This is normally
>> written using just a mask of 0xffff0000, unless you want a different
>> number of bits in the hi terms for technical reasons. Double precision
>> can benefit more from asymmetric splitting of terms since 53 is not
>> divisible by 2; 1 hi term must have less than 26.5 bits and the other term
>> can hold an extra bit.
>
> Because I didn't think about using a mask. :-)
>
> It's easy to change 14 to 13 or 11 or ..., while I would
> need to write out zeros and one to come up with 0xffff8000,
> etc.
Here are some examples of more delicate splittings from the uncommitted
clog*(). They are usually faster than GET/SET, but slower than converting
to lower precision as is often possible for double precision and ld128
only. clog*() can't use the casting method since it needs to split in the
middle, and doesn't use GET/SET since it is slow. It uses methods that
only work on args that are not too large or too small, and uses a GET
earlier to classify the arg size.
XX double prec:
XX double_t ax, ax2h, ax2l, t;
XX
XX t = (double)(ax * (0x1p27 + 1));
XX axh = (double)(ax - t) + t;
XX axl = ax - axh;
Splitting in the middle is required to calculate ax*ax exactly. Since the
middle bit is 26.5, splitting in the middle is impossible, but there are
tricks to make spitting at bit 26 or 27 work just as well.
Multipling by 0x1p27 shifts to a place where the lower bits are canceled
by the subtraction. Adding 1 is a trick to create a extra (virtual) bit
needed for the exact multiplication.
At least ax needs to have the extra-precision type double_t so that casting
to double works. Otherwise, this code would need STRICT_ASSIGN() to work
around compiler bugfreatures. We assign back to double_t for efficiency.
XX float prec:
XX float_t ax, axh, axl, t;
XX
XX t = (float)(ax * (0x1p12F + 1));
XX axh = (float)(ax - t) + t;
XX axl = ax - axh;
This is simpler because there is a middle. I think adding 1 is not needed.
XX long double double prec:
XX #if LDBL_MANT_DIG == 64
XX #define MULT_REDUX 0x1p32 /* exponent MANT_DIG / 2 rounded up */
XX #elif LDBL_MANT_DIG == 113
XX #define MULT_REDUX 0x1p57
XX #else
XX #error "Unsupported long double format"
XX #endif
XX
XX long double ax, axh, axl, t;
XX
XX /* Calculate ax*ax and ay*ay exactly using Dekker's algorithm. */
XX t = (long double)(ax * (MULT_REDUX + 1));
XX axh = (long double)(ax - t) + t;
XX axl = ax - axh;
This is more unsimpler because the middle depends on a parameter and it
is fractional in some cases. If adding 1 is not needed for the non-
fractional case, then it should be added in MULT_REDUX in the fractional
case only.
Several functions including expl and e_rem_pio* use this REDUX method
with the parameter hard-coded in a different way, usually with a
slower STRICT_ASSIGN() and a special conversion to an integer or integer
divided by a power of 2. The above is supposed to be more careful with
extra precision so that STRICT_ASSIGN() is not needed.
The calculation of axh may have problems with double rounding, but I think
this doesn't matter here -- whatever axh is, the rest ends up in axl and
we just need to ensure that both have 27 lower bits zero.
Bruce
More information about the freebsd-numerics
mailing list