cexpl() (was: Re: Update ENTERI() macro)

Sat Mar 9 03:28:38 UTC 2019

On Fri, 8 Mar 2019, Steve Kargl wrote:

> (reducing quote depth.  I'll need time to digest your bit analysis)
>
> On Sat, Mar 09, 2019 at 07:53:24AM +1100, Bruce Evans wrote:
>> On Fri, 8 Mar 2019, Steve Kargl wrote:
>>
>>> This then raises the question on whether we should change the
>>> limit to 32 in the double complex ccosh()?
>>
>> Do you mean from 64 to 32 the non-complex cosh(), or from your current
>> limit to the above?
>
> I mean ccosh(double complex).  Copyright date is 2005 for
> s_ccosh.c while ld80/*_expl.* has datesi of 2009-2013.  It
> seems you and I developed s_ccosh.c much earlier than the
> Tang-based expl().  In s_ccosh.c, we have
>
> 	if (ix < 0x40360000) /* |x| < 22: normal case */
> 	    return (CMPLX(cosh(x) * cos(y), sinh(x) * sin(y)));
>
> 	/* |x| >= 22, so cosh(x) ~= exp(|x|) */
> 	if (ix < 0x40862e42) {
> 		/* x < 710: exp(|x|) won't overflow */
> 		h = exp(fabs(x)) * 0.5;
> 		return (CMPLX(h * cos(y), copysign(h, x) * sin(y)));
> 	} ...

I see.

> Would it be beneficial to change |x| < 22 to |x| < 32?

No, 22 is correct.  It is copied from cosh() and sinh().  It is
from allowing for exp() to have 10 or 11 extra bits of accuracy
(which it might actually do for a hardware or other exp() with
internal extra precision not limited by the ABI or broken to C11
spec so that it has to destroy the extra etra precision on return).
This gives a threshold of x = log(2**(52+11 or 53+10)) / 2 = 21.83
for one of exp(+-x) being negligible relative to the other.  Take
the ceiling to get 22.

ccoshl() uses the result of cosh(), so must use a threshold of 22
or more to let cosh() do the best that it can.  Similarly for
sinh() -- use a threshold >= the threshold of each.  Each uses 22
so 22 is already large enough.

32 is >= 22, so we can use that too, but it is slower in the
range [22, 32].

cosh() and sinh() usually don't get near 11 extra bits of accuracy.
The threshold for 53 bits is 18.36+.  They are slower than necessary
between this and 22.

This is more complicated for long double precision.  There coshl()
and sinhl() use a threshold of 64 and are slower than necessary between
this and 22.5 to 26 for ld80 and 39.5 and 43 for ld128.  This slowness
is to simplify the code and thus optimize for more usual cases.  The
gap between the thresholds is much larger than for double precision,
so if we copy the individual functions' threshold of 64 then we get
unnecessary slowness over a larger range.  But we shouldn't know so
much about the functions' implementation that we intentionally optimize
differently for this range.  It is bad enough that we have to know their
internal threshold to ensure getting consistent results.

> While we have kernels for exp(), I did not commit your
> Tang-based exp(). So, exp() has ulp of 0.7 to 0.8 instead
> of 0.5xx.  Maybe using 32 won't buy us anything.

My exp() is based on old fdlibm (Ng?) since the intervals method didn't
work as well as expected and I was able to get enough accuracy for <
1 ulp of error in the hyperbolic functions by exporting hi + lo
decompositions using kernels.  exp() still has a maximum error of about
0.75 ulps on amd64.  But on i386, it has a maximum error of 0.5092
ulps in float precision, by using float_t to get all evaluations in
double precision.  In double precision, double_t defaults to giving
only double precision, so the maximum error is usually the same 0.75
as on amd64.  exp() on i386 now uses the i387 and switches it to 64-bit
precision.  The i387 barely delivers double precision for exp() using
the extra precision.  (We almost knew that it couldn't deliver long
double precision for expl() when we implemented expl() in software).

My error table has grown many special cases to record bad old behaviour
for regression tests.  For exp() it is:

exp:   max_er =      0x642 0.7822, avg_er = 0.007, #>=1:0.5 = 0:2566529
exp:   max_er =      0x5fd 0.7485, avg_er = 0.007, #>=1:0.5 = 0:2733369 (s/w,PD)
exp:   max_er =      0x405 0.5024, avg_er = 0.007, #>=1:0.5 = 0:2253093 (s/w,PE)

0.7822 is for i387 exp.  0.7485 is the error for the fdlibm method
without extra precision in hardware (FP_PD on i386).  0.5024 is the
error for a modified fdlibm method with 11 bits of extra precision
(FP_PE onl i386) and double_t instead of double to use this, and some
reordering.

Bruce