Implementation of half-cycle trignometric functions

Fri Apr 28 09:39:45 UTC 2017

On Thu, 27 Apr 2017, Steve Kargl wrote:

> On Thu, Apr 27, 2017 at 04:14:11PM -0700, Steve Kargl wrote:
>>
>> I have attached a new diff to the bugzilla report.  The
>> diff is 3090 lines and won't be broadcast the mailing list.
>>
>> This diff includes fixes for a few inconsequential bugs
>> and implements modified Estrin's method for sum a few
>> ploynomials.  If you want the previous Horner's method
>> then add -DHORNER to your CFLAGS.
>
> For those curious about testing, here are some numbers
> for the Estrin's method.  These were obtained on an AMD
> FX(tm)-8350 @ 4018.34-MHz.  The times are in microseconds
> and the number in parentheses are estimated cycles.
>
>            |    cospi     |    sinpi     |    tanpi
> ------------+--------------+--------------+--------------
> float       | 0.0089 (37)  | 0.0130 (54)  | 0.0194 (80)
> double      | 0.0134 (55)  | 0.0160 (66)  | 0.0249 (102)
> long double | 0.0557 (228) | 0.0698 (287) | 0.0956 (393)
> ------------+--------------+--------------+--------------
>
> The timing tests are done on the interval [0,0.25] as
> this is the interval where the polynomial approximations
> apply.  Limited accuracy testing gives

These still seem slow.  I did a quick test of naive evaluations of
these functions as standard_function(Pi * x) and get times a bit faster
on Haswell, except 2-4 times faster for the long double case, with the
handicaps of using gcc-3.3.3 and i386.  Of course, the naive evaluation
is inaccurate, especially near multiples of Pi/2.

> x in [0,0.25]   |   tanpif   |   tanpi    |   tanpil
> -----------------+------------+------------+-------------
>         MAX ULP | 1.37954760 | 1.37300168 | 1.38800823

Just use the naive evaluation to get similar errors in this
range.  It is probably faster too.  For tiny x, both reduce
to the approximation Pi*x, with an error like this expected
unless the evaluation is done in extra precision.

> In the interval [0.25,0.5] tanpi[fl] is computed by
> cospi / sinpi.  The numbers look like
>
> x in [0.25,0.5] |   tanpif   |   tanpi    |   tanpil
> -----------------+------------+------------+-------------
>         MAX ULP | 1.93529165 | 2.04485533 | 1.95823689

The errors build up only linearly in the number of operations,
which is good.

Note that on i386 with its extended precision, in float precision
the naive method is accurate to nearly 0.5 ulps provided you use
extended precision for Pi, the multiplication, and also the function,
so sinpif() is only worth having if it can do this almost as fast
as sinf() (about 15 cycles throughput and less than 100 latency
(50?) on modern x86).  The extra precision is used automatically
by sinf() (by using a double hack.  Double is not very different
from float+extended on i386).  I think accuracy is enough up to extend
float precision up to a useful multiple of Pi (suppose double precision
and not full extended, so only 53 bits for Pi, so 29 extra; lose 24
to cancelations and 5 are left, so the accuracy is enough up to about
2**5*Pi).

Bruce