cexpl() (was: Re: Update ENTERI() macro)
Bruce Evans
brde at optusnet.com.au
Fri Mar 8 20:53:33 UTC 2019
On Fri, 8 Mar 2019, Steve Kargl wrote:
> On Sat, Mar 09, 2019 at 05:02:57AM +1100, Bruce Evans wrote:
>> On Fri, 8 Mar 2019, Steve Kargl wrote:
>>> On Sat, Mar 09, 2019 at 12:51:07AM +1100, Bruce Evans wrote:
>>>> On Thu, 7 Mar 2019, Steve Kargl wrote:
>>>> ...
>>>> IIRC, that was done by extracting into a 32-bit lx. This could be written
>>>> as (lx >> 32) > 0xb17217f7 (or a fuzzier value), and it might be better
>>>> to do that anyway.
>>>
>>> This is what I do in my WIP ccosh. There are 3 thresholds
>>> of ~22.9, ~11356, and ~22755. I'm casting the result to
>>> uint32_t, so something like (uint32_t)(lx >> 32) > 0xb17000000.
>>> Do I need the cast and/or do you prefer it?
>>
>> I think you can even round to a power of 2 and not look at any mantissa
>> bits to classify this, as for the 64 in coshl().
>
> You used your k_hexpl() and take advantage of the hi+lo splitting
> to use x < 64.
Or I designed my k_hexpl() to support this splitting up to x < 64. Long
double precision is easier because it has (committed) support functions
like k_hexpl().
>> 22.9 here seems wrong. It is approximately the threshold where cosh()
>> decays to exp(). The threshold for coshl() delay is much higher.
>> coshl() uses 64, which is high enough for both ld80 and ld128. More
>> like 32 would work for ld80.
>
> I actually round up to 23 for ld80, and haven't thought too much
> about ld128. A value of 22 is used for s_ccosh.c, which isn't
> quite large enough for long double. (Yes, I looked at bits)
>
> % ./gen 22
> coshl(22) = 1.79245642306579578097e+09
> expl(22) / 2 = 1.79245642306579578086e+09
> sinhl(22) = 1.79245642306579578074e+09
>
> % ./gen 23
> coshl(23) = 4.87240172312445130013e+09
> expl(23) / 2 = 4.87240172312445130013e+09
> sinhl(23) = 4.87240172312445130013e+09
Look at more bits :-). For r + 1/r, ignoring the 1/r term gives an error
of about 1 ulp in ld80 precision when 1/r/r is about 2**-64. This gives
a threshold of about log(2**(64/2)) = 22.18 for accuracy of about 1 ulp.
But we want more accuracy than that. For 0.5 ulps, log(2**(65/2) = 22.52+.
We want more than that too. expl() gives almost 10 extra bits of accuracy,
and the hyperbolic functions use the expl kernel to get all these extra
bits. expl(x) itself uses a threshold of 2**-75 instead of 2**-65 for
x being so tiny that the x**2/2 term and higher terms are negligible
compared with the x term. So we should try for 11 extra bits here. This
gives a threshold of log(2**(75/2)) = 25.99++.
ld128 expl() is missing the change corresponding to changing 65 to 75 for
the x vs x**2/2 threshold. It still uses 114. After fixing this to 124,
the corresponding threshold here is log(2**124/2) = 42.97+.
> in s_ccoshl.c I then have
>
> if (ix < 0x4003 || /* |x| < ~23: normal case */
> (ix == 0x4003 && (uint32_t)(lx >> 32) < 0xb8000000))
> RETURNI(CMPLXL(coshl(x) * c, sinhl(x) * s));
k_hexpl() can't handle this case. (But if you want efficiency or accuracy,
then you should implement sinhcoshl() or just use the kernel to evaluate
exp(x) only once.)
I think this formula works up to 32 or 64 or even up to the overflow
threshold. coshl(x) and sinhl() decay to expl(x) starting a bit above
23, but, but these functions can optimize that almost as well as here
(they actually don't bother to until x reaches 64).
You can use this to eliminate the check of lx.
>
> /* |x| >= 23, so cosh(x) ~= exp(|x|) / 2 */
> if (ix < 0x400c ||
> (ix == 0x400c && (uint32_t)(lx >> 32) <= 0xb1700000)) {
> /* x < 11356: exp(|x|) won't overflow */
> h = expl(fabsl(x)) / 2;
> RETURNI(CMPLXL(h * c, copysignl(h, x) * s));
Use a power of 2 for the upper threshold too if possible.
> With |x| < 23, we have 2 function calls for coshl() and sinhl()
> as opposed to the 1 function call for expl() in 23 <= x <= 32 range.
> I can write the first conditional as
>
> if (ix < 0x4004) /* |x| < 32: normal case */
> RETURNI(CMPLXL(coshl(x) * c, sinhl(x) * s));
The general formula is indeed unecessarily slow in that range. Also
for the more important range [1, 23]. coshl() in the lower range is
lo + 0.25/(hi + lo) + hi after h_expl() to get hi and lo. Similarly
for sinhl(). These formulas are short enough to repeat in the above,
but I think this optimization is excessive. It is better for accuracy
after some rearrangement.
> This then raises the question on whether we should change the
> limit to 32 in the double complex ccosh()?
Do you mean from 64 to 32 the non-complex cosh(), or from your current
limit to the above?
Bruce
More information about the freebsd-numerics
mailing list