Update ENTERI() macro
Bruce Evans
brde at optusnet.com.au
Thu Mar 7 06:36:36 UTC 2019
On Wed, 6 Mar 2019, Steve Kargl wrote:
> On Thu, Mar 07, 2019 at 03:09:06PM +1100, Bruce Evans wrote:
>> On Wed, 6 Mar 2019, Steve Kargl wrote:
>> ...
>> But does it preserve NaN bits as much as possible, and produce the same
>> NaN bits as double precision after conversion to double precision? The
>> report spells NaN and Inf using C99's fuzzy bad names. Even the C99
>> specification spells NaN and Inf correctly in Appendix F.
>>
>> The complex hyperbolic functions have better comments, so are almost
>> an easier place to start.
>
> I suppose it preserves the NaN bits as well as cexpf and cexp as
> it uses the same algorthim.
> ...
> long double complex
> cexpl (long double complex z)
> {
> long double c, exp_x, s, x, y;
> uint64_t lx, ly;
> uint16_t hx, hy, ix, iy;
>
> ENTERI();
>
> x = creall(z);
> y = cimagl(z);
>
> EXTRACT_LDBL80_WORDS(hx, lx, x);
> EXTRACT_LDBL80_WORDS(hy, ly, y);
I think you recently changed to this from your new macro.
>
> ix = hx&0x7fff;
> iy = hy&0x7fff;
Some style bugs. s_cexp.c doesn't use the fdlibm'ish style of sometimes
no spaces around '&'. It puts the ix and iy calculations immediately
after the extractions. They shouldn't be separated since they are
also extractions. c_cexp.c extracts in a different order, and doesn't
use iy. You reordered it a bit to use sincos(). I don't like the
reordering much. But old s_cexp.c has its own style bug of a separating
the extraction for y but not even following its usual style of a blank
line before comments near the extraction of x.
>
> /* cexp(0 + I y) = cos(y) + I sin(y) */
> if ((ix | lx) == 0) {
> sincosl(y, &s, &c);
> RETURNI(CMPLXL(c, s));
> }
>
> /* cexp(x + I 0) = exp(x) + I 0 */
> if ((iy | ly) == 0)
> RETURNI(CMPLXL(expl(x), y));
>
> if (iy == 0x7fff) {
s_cexp.c actually uses a clobbered hy here. Perhaps fix this too (compilers
mostly ignore program order for initializing variables like iy and optimize
to an expression involving the unclobbered hy here if that is optimal. The
extra variables are just for readability or unreadability).
> if (ix < 0x7fff ||
> (ix == 0x7fff && (lx & 0x7fffffffffffffffULL) != 0)) {
> /* cexp(finite|NaN +- I Inf|NaN) = NaN + I NaN */
> RETURNI(CMPLXL(y - y, y - y));
Looks OK.
I think ix and iy are only used here, so it is clearer to not have
variables for them -- just use (hx & 0x7fff), etc.
y - y and similar expressions usually give the right NaN (y = Inf -> dNaN
and y = NaN -> d(y)).
IIRC, the only differences for ld128 is that then normalization bit is
hidden so masking out the top bit in the mantissa is not needed, but the
mantissa is in 2 64-bit words.
> } else if (hx & 0x8000) {
This is clearest as it is with an explicit mask.
> /*
> * exp_ovfl = 11356.5234062941439497, 0x400c, 0xb17217f7d1cf79ac
> * cexp_ovfl = 22755.3287906024445633, 0x400d, 0xb1c6a8573de9768c
> */
> if ((hx == 0x400c && lx > 0xb17217f7d1cf79acULL) ||
> (hx == 0x400d && lx < 0xb1c6a8573de9768cULL)) {
Don't duplicate the hex constants in the comment. The limits can be fuzzy
so don't need so many decimal digits or nonzero nonzero bits or LD80C()
to round them (s_cexp.c uses only 1 decimal digit and doesn't test the
lower 32 bits in the mantissa; here we have more bits than we need in lx).
Maybe not use bit-fiddling at all for this (in other precisions too).
The more important exp() functions check thresholds in FP. Here x can
be NaN so FP comparisons with it don't work right. That seems to be
the only reason to check in bits here.
> /*
> * x is between exp_ovfl and cexp_ovfl, so we must scale to
> * avoid overflow in exp(x).
> */
> RETURNI(__ldexp_cexpl(z, 1));
Did your tests cover this case, and does it use the fixed version of the
old __ldexp_cexpl() in k_expl.h?
> } else {
> /*
> * Cases covered here:
> * - x < exp_ovfl and exp(x) won't overflow (common case)
> * - x > cexp_ovfl, so exp(x) * s overflows for all s > 0
> * - x = +-Inf (generated by exp())
> * - x = NaN (spurious inexact exception from y)
> */
> exp_x = expl(x);
> sincosl(y, &s, &c);
> RETURNI(CMPLXL(exp_x * c, exp_x * s));
> }
> }
I think the last clause can be simplfied and even optimized by using the
kernel in all cases:
- the kernel produces exp_x in the form kexp_x * 2**k where kexp_x is near
1. Just multiply kexp_x by c and s, and later multiply by 2**k. This
cost an extra multiplication by 2**k (for the real and imaginary parts
separately), but avoids other work so may be faster.
- hopefully kexp_x is actually >= 1 (and < 2). Otherwise we have to
worry about spurious underflow
- the above method seems to avoid spurious underflow automatically, and
doesn't even mention this (multiplying by exp_x may underflow, but not
spuriously since c and s are <= 1).
Bruce
More information about the freebsd-numerics
mailing list