cexpl() (was: Re: Update ENTERI() macro)

Fri Mar 8 19:11:57 UTC 2019

On Sat, Mar 09, 2019 at 05:02:57AM +1100, Bruce Evans wrote:
> On Fri, 8 Mar 2019, Steve Kargl wrote:
> > On Sat, Mar 09, 2019 at 12:51:07AM +1100, Bruce Evans wrote:
> >> On Thu, 7 Mar 2019, Steve Kargl wrote:
> >>> On Thu, Mar 07, 2019 at 05:36:22PM +1100, Bruce Evans wrote:
> >>>> On Wed, 6 Mar 2019, Steve Kargl wrote:
> >>>>>	/*
> >>>>>	 *  exp_ovfl = 11356.5234062941439497, 0x400c, 0xb17217f7d1cf79ac
> >>>>>	 * cexp_ovfl = 22755.3287906024445633, 0x400d, 0xb1c6a8573de9768c
> >>>>>	 */
> >>
> >> I checked that these are correct.  Any rounding should be down for
> >> exp_ovfl and up for cexp_ovfl.
> >>
> >>>>> 	if ((hx == 0x400c && lx > 0xb17217f7d1cf79acULL) ||
> >>>>> 	    (hx == 0x400d && lx < 0xb1c6a8573de9768cULL)) {
> >>>>
> >>>> Don't duplicate the hex constants in the comment. The limits can be fuzzy
> >>>> so don't need so many decimal digits or nonzero nonzero bits or LD80C()
> >>>> to round them (s_cexp.c uses only 1 decimal digit and doesn't test the
> >>>> lower 32 bits in the mantissa; here we have more bits than we need in lx).
> >>>
> >>> Being fuzzy was the movitation for the new macro, I was only using
> >>> the upper 32 bits of lx in the original cexpl() I posted.
> >>
> >> IIRC, that was done by extracting into a 32-bit lx.  This could be written
> >> as (lx >> 32) > 0xb17217f7 (or a fuzzier value), and it might be better
> >> to do that anyway.
> >
> > This is what I do in my WIP ccosh.  There are 3 thresholds
> > of ~22.9, ~11356, and ~22755.  I'm casting the result to
> > uint32_t, so something like (uint32_t)(lx >> 32) > 0xb17000000.
> > Do I need the cast and/or do you prefer it?
> 
> I think you can even round to a power of 2 and not look at any mantissa
> bits to classify this, as for the 64 in coshl().

You used your k_hexpl() and take advantage of the hi+lo splitting
to use x < 64.

> 22.9 here seems wrong.  It is approximately the threshold where cosh()
> decays to exp().  The threshold for coshl() delay is much higher.
> coshl() uses 64, which is high enough for both ld80 and ld128.  More
> like 32 would work for ld80.

I actually round up to 23 for ld80, and haven't thought too much
about ld128.  A value of 22 is used for s_ccosh.c, which isn't
quite large enough for long double. (Yes, I looked at bits)

% ./gen 22
coshl(22)    = 1.79245642306579578097e+09
expl(22) / 2 = 1.79245642306579578086e+09
sinhl(22)    = 1.79245642306579578074e+09

% ./gen 23
coshl(23)    = 4.87240172312445130013e+09
expl(23) / 2 = 4.87240172312445130013e+09
sinhl(23)    = 4.87240172312445130013e+09

in s_ccoshl.c I then have

	if (ix < 0x4003 ||      /* |x| < ~23: normal case */
	    (ix == 0x4003 && (uint32_t)(lx >> 32) < 0xb8000000))
		RETURNI(CMPLXL(coshl(x) * c, sinhl(x) * s));

	/* |x| >= 23, so cosh(x) ~= exp(|x|) / 2 */
	if (ix < 0x400c ||
	    (ix == 0x400c && (uint32_t)(lx >> 32) <= 0xb1700000)) {
		/* x < 11356: exp(|x|) won't overflow */
		h = expl(fabsl(x)) / 2;
		RETURNI(CMPLXL(h * c, copysignl(h, x) * s));

With |x| < 23, we have 2 function calls for coshl() and sinhl()
as opposed to the 1 function call for expl() in 23 <= x <= 32 range.
I can write the first conditional as

	if (ix < 0x4004)     /* |x| < 32: normal case */
		RETURNI(CMPLXL(coshl(x) * c, sinhl(x) * s));

This then raises the question on whether we should change the
limit to 32 in the double complex ccosh()?

-- 
Steve