Implement cexpl

Wed Feb 27 18:06:25 UTC 2019

On Wed, Feb 27, 2019 at 07:42:41PM +1100, Bruce Evans wrote:
> On Tue, 26 Feb 2019, Steve Kargl wrote:
> 
> > On Sun, Feb 03, 2019 at 09:30:56AM -0800, Steve Kargl wrote:
> >> The following patch implements cexpl() for ld80 and ld128 architectures.
> > ...
> > Index: lib/msun/src/math_private.h
> > ===================================================================
> > --- lib/msun/src/math_private.h	(revision 344600)
> > +++ lib/msun/src/math_private.h	(working copy)
> > @@ -243,6 +243,20 @@
> > } while (0)
> >
> > /*
> > + * Get expsign and mantissa as 16 bit and 2 32-bit ints from an 80 bit long
> > + * double.
> > + */
> > +
> > +#define	EXTRACT_LDBL80_WORDS2(ix0,ix1,ix2,d)				\
> > +do {								\
> > +  union IEEEl2bits ew_u;					\
> > +  ew_u.e = (d);							\
> > +  (ix0) = ew_u.xbits.expsign;					\
> > +  (ix1) = ew_u.bits.manh;					\
> > +  (ix2) = ew_u.bits.manl;					\
> > +} while (0)
> 
> Don't add more of these accessor functions (more were already added to
> support powl, since powl doesn't use the FreeBSD spelling in APIs).

This response was not totally unexpected. :-)

I haven't looked a bit twiddling details in awhile, so it can
be considered an exercise in "Let's get something working".

> Use existing separate functions and let the compiler combine them.
> The one corresponding to this is EXTRACT_LDBL80_WORDS().  That extracts
> the mantissa into a single 64-bit word which is easier to use than the
> 2 32-bit words given by the above.  It also doesn't ask for pessimal
> extraction from memory 32 bits at a time, but compilers usually combine
> into a single 64-bit word for the memory access and then split into
> 2 32-bit words for 32-bit arches.
> 
> 3 words instead of 2 also enlarges diffs with the double precision case.
> 
> > Index: lib/msun/src/s_cexp.c
> > ===================================================================
> > --- lib/msun/src/s_cexp.c	(revision 344600)
> > +++ lib/msun/src/s_cexp.c	(working copy)
> > @@ -30,6 +30,7 @@
> > __FBSDID("$FreeBSD$");
> >
> > #include <complex.h>
> > +#include <float.h>
> > #include <math.h>
> >
> > #include "math_private.h"
> > @@ -41,12 +42,19 @@
> > double complex
> > cexp(double complex z)
> > {
> > -	double x, y, exp_x;
> > +	double c, exp_x, s, x, y;
> > 	uint32_t hx, hy, lx, ly;
> >
> > 	x = creal(z);
> > 	y = cimag(z);
> >
> > +	EXTRACT_WORDS(hx, lx, x);
> > +	/* cexp(0 + I y) = cos(y) + I sin(y) */
> > +	if (((hx & 0x7fffffff) | lx) == 0) {
> > +		sincos(y, &s, &c);
> > +		return (CMPLX(c, s));
> > +	}
> > +
> > 	EXTRACT_WORDS(hy, ly, y);
> > 	hy &= 0x7fffffff;
> >
> 
> Another reason I don't like sincos*() is that if the compiler can very
> easily do the above rewrite automatically.  The only large difficulty
> is to know if the sin(), cos() and sincos() are any good, or at least
> their relative badness.  x86 compilers already know that the i387 sin()
> and cos() are no good, (or perhaps they just don't know that the FreeBSD
> implementation doesn't make the i387 functions non-conforming to C99
> and/or POSIX (since they don't set errno and might do other side effects
> wrong)), so x86 compilers don't convert the above to i387 sin(), cos() or
> sincos().  It is even harder for compilers to know the quality of the
> library.

Unfortunately, the compiler doesn't do the rewrite automatically.
AFAICT, for gcc  CMPLX(cos(x),sin(x)) is transformed into
cexp(CMPLX(0,x)).  It then assumes that cexp handles the special 
case.  gcc doesn't do this optimization, yet.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89125

See comment #7.

> > @@ -53,10 +61,6 @@
> > 	/* cexp(x + I 0) = exp(x) + I 0 */
> > 	if ((hy | ly) == 0)
> > 		return (CMPLX(exp(x), y));
> 
> The comments aren't as careful as for hyperbolic functions.  The 0 here
> is actually -0 if y is -0.  The code handles both +-0 by clearing the
> sign bit in hy for the check byt keeping it in y.
> 
> > Index: lib/msun/ld128/k_cexpl.c
> > ...
> > +/*
> > + * __ldexp_exp(x, expt) and __ldexp_cexp(x, expt) compute exp(x) * 2**expt.
> > + * They are intended for large arguments (real part >= ln(DBL_MAX))
> > + * where care is needed to avoid overflow.
> > + *
> > + * The present implementation is narrowly tailored for our hyperbolic and
> > + * exponential functions.  We assume expt is small (0 or -1), and the caller
> > + * has filtered out very large x, for which overflow would be inevitable.
> > + */
> > +
> > +long double
> > +__ldexp_expl(long double x, int expt)
> > +{
> > +#if 0
> > +	double exp_x, scale;
> > +	int ex_expt;
> > +
> > +	exp_x = __frexp_exp(x, &ex_expt);
> > +	expt += ex_expt;
> > +	INSERT_WORDS(scale, (0x3ff + expt) << 20, 0);
> > +	return (exp_x * scale);
> > +#endif
> > +	return (x - x) / (x - x);
> > +}
> 
> This is less likely to work than the comment says :-).

I don't have ld128 hardware.  I can't write the functions.
When this is compile, one should see warnings 

#warning These functions are broken on ld128 architectures.
#warning Functions defined here are currently unused.
#warning Someone who cares needs to convert src/k_exp.c.

I suspect no one cares.

> > Index: lib/msun/ld128/s_cexpl.c
> > ===================================================================
> > --- lib/msun/ld128/s_cexpl.c	(nonexistent)
> > +++ lib/msun/ld128/s_cexpl.c	(working copy)
> > @@ -0,0 +1,76 @@
> > ...
> > +	exp_x = expl(x);
> > +	sincosl(y, &s, &c);
> > +	return (CMPLXL(exp_x * c, exp_x * s));
> 
> This doesn't use the kernel, although one is written (but always returns
> NaN).  For testing, the low quality kernel should at least use the
> above formula.

I don't have ld128 hardware.  I can't write the function in a manner
similar to src/s_cexp.c.  When this is compile, one should see warnings 

#warning cexpl is likely broken on ld128 architectures.
#warning Someone who cares needs to convert src/s_cexp.c.

I suspect no one cares.

Furthermore, when one links in cexpl(), she should see

__warn_references("Precision of cexpl may have lower than expected precision");

I suspect no one cares.

> > Index: lib/msun/ld80/s_cexpl.c
> > ===================================================================
> > --- lib/msun/ld80/s_cexpl.c	(nonexistent)
> > +++ lib/msun/ld80/s_cexpl.c	(working copy)
> > @@ -0,0 +1,105 @@
> > ...
> > +long double complex
> > +cexpl (long double complex z)
> > +{
> > +	long double c, exp_x, s, x, y;
> > +	uint32_t hwx, hwy, lwx, lwy;
> > +	uint16_t hx, hy;
> > +
> > +	ENTERI(z);
> > +
> > +	x = creall(z);
> > +	y = cimagl(z);
> > +
> > +	EXTRACT_LDBL80_WORDS2(hx, hwx, lwx, x);
> > +	EXTRACT_LDBL80_WORDS2(hy, hwy, lwy, y);
> 
> This asks for slowness relative to the double precision method, by doing
> 2 slow extraction steps up front.
> 
> This seems to be broken by not using the normal method of hx &= 0x7fff
> up front.
> 
> > +
> > +	/* cexp(0 + I y) = cos(y) + I sin(y) */
> > +	if ((hwx | lwx | (hx & 0x7fff)) == 0) {
> 
> The 2 words for the mantissa are just harder to use.  Write hwx | lwx
> normally as lx after extracting into 1 64-bit word.
> 
> This handles -0 correctly by masking hx with 0x7fff.
> 
> > +		sincosl(y, &s, &c);
> > +		RETURNI(CMPLXL(c, s));
> > +	}
> > +
> > +	/* cexp(x + I 0) = exp(x) + I 0 */
> > +	if ((hwy | lwy | (hy & 0x7fff)) == 0)
> > +		RETURNI(CMPLXL(expl(x), y));
> > +
> > +	if (hy >= 0x7fff) {
> 
> Bugs from not masking h[xy] with 0x7fff up front start here.  This
> misclassifies all negative y's except -0 as being NaNs.

Whoops.  I lost a month of testing due to the PAE vs non-PAE
fallout; yes, I should have done more testing.

-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow