cexpl() (was: Re: Update ENTERI() macro)
Steve Kargl
sgk at troutmask.apl.washington.edu
Fri Mar 8 19:11:57 UTC 2019
On Sat, Mar 09, 2019 at 05:02:57AM +1100, Bruce Evans wrote:
> On Fri, 8 Mar 2019, Steve Kargl wrote:
> > On Sat, Mar 09, 2019 at 12:51:07AM +1100, Bruce Evans wrote:
> >> On Thu, 7 Mar 2019, Steve Kargl wrote:
> >>> On Thu, Mar 07, 2019 at 05:36:22PM +1100, Bruce Evans wrote:
> >>>> On Wed, 6 Mar 2019, Steve Kargl wrote:
> >>>>> /*
> >>>>> * exp_ovfl = 11356.5234062941439497, 0x400c, 0xb17217f7d1cf79ac
> >>>>> * cexp_ovfl = 22755.3287906024445633, 0x400d, 0xb1c6a8573de9768c
> >>>>> */
> >>
> >> I checked that these are correct. Any rounding should be down for
> >> exp_ovfl and up for cexp_ovfl.
> >>
> >>>>> if ((hx == 0x400c && lx > 0xb17217f7d1cf79acULL) ||
> >>>>> (hx == 0x400d && lx < 0xb1c6a8573de9768cULL)) {
> >>>>
> >>>> Don't duplicate the hex constants in the comment. The limits can be fuzzy
> >>>> so don't need so many decimal digits or nonzero nonzero bits or LD80C()
> >>>> to round them (s_cexp.c uses only 1 decimal digit and doesn't test the
> >>>> lower 32 bits in the mantissa; here we have more bits than we need in lx).
> >>>
> >>> Being fuzzy was the movitation for the new macro, I was only using
> >>> the upper 32 bits of lx in the original cexpl() I posted.
> >>
> >> IIRC, that was done by extracting into a 32-bit lx. This could be written
> >> as (lx >> 32) > 0xb17217f7 (or a fuzzier value), and it might be better
> >> to do that anyway.
> >
> > This is what I do in my WIP ccosh. There are 3 thresholds
> > of ~22.9, ~11356, and ~22755. I'm casting the result to
> > uint32_t, so something like (uint32_t)(lx >> 32) > 0xb17000000.
> > Do I need the cast and/or do you prefer it?
>
> I think you can even round to a power of 2 and not look at any mantissa
> bits to classify this, as for the 64 in coshl().
You used your k_hexpl() and take advantage of the hi+lo splitting
to use x < 64.
> 22.9 here seems wrong. It is approximately the threshold where cosh()
> decays to exp(). The threshold for coshl() delay is much higher.
> coshl() uses 64, which is high enough for both ld80 and ld128. More
> like 32 would work for ld80.
I actually round up to 23 for ld80, and haven't thought too much
about ld128. A value of 22 is used for s_ccosh.c, which isn't
quite large enough for long double. (Yes, I looked at bits)
% ./gen 22
coshl(22) = 1.79245642306579578097e+09
expl(22) / 2 = 1.79245642306579578086e+09
sinhl(22) = 1.79245642306579578074e+09
% ./gen 23
coshl(23) = 4.87240172312445130013e+09
expl(23) / 2 = 4.87240172312445130013e+09
sinhl(23) = 4.87240172312445130013e+09
in s_ccoshl.c I then have
if (ix < 0x4003 || /* |x| < ~23: normal case */
(ix == 0x4003 && (uint32_t)(lx >> 32) < 0xb8000000))
RETURNI(CMPLXL(coshl(x) * c, sinhl(x) * s));
/* |x| >= 23, so cosh(x) ~= exp(|x|) / 2 */
if (ix < 0x400c ||
(ix == 0x400c && (uint32_t)(lx >> 32) <= 0xb1700000)) {
/* x < 11356: exp(|x|) won't overflow */
h = expl(fabsl(x)) / 2;
RETURNI(CMPLXL(h * c, copysignl(h, x) * s));
With |x| < 23, we have 2 function calls for coshl() and sinhl()
as opposed to the 1 function call for expl() in 23 <= x <= 32 range.
I can write the first conditional as
if (ix < 0x4004) /* |x| < 32: normal case */
RETURNI(CMPLXL(coshl(x) * c, sinhl(x) * s));
This then raises the question on whether we should change the
limit to 32 in the double complex ccosh()?
--
Steve
More information about the freebsd-numerics
mailing list