Implementing cpow(3)

Bruce Evans brde at optusnet.com.au
Sat Oct 6 01:47:33 UTC 2012

```On Sat, 6 Oct 2012, Peter Jeremy wrote:

> I have been meditating on cpow(3) for some time without achieving
> enlightenment.  The following is somewhat of a braindump in the
> hope that someone will advise whether I am on the right track or
> have gone off the rails.
> ...

> - Because exp(x) pushes fraction bits into the exponent, achieving 53 bits
>  of precision in the result requires about 63 bits of input fraction.

Look at the real pow() to see how it gets enough (?) extra precision,
and also for how it handles exceptional and other special cases.  Look
at both the fdlibm version and the old BSD libm version (in the Attic,
unless svn lost it).  I've barely looked at them, but noticed that they
do the following for extra precision:
- fdlibm duplicates most of exp() and log() manually inline in exp(),
so as to get at their internals.  It is hard to see how this provides
enough extra precision, since the internals are only accurate to half
an ulp.  The power series are accurate to 1/32 ulps, but it is hard
to evaluate them that accurately.  Well, maybe pow() does the usual
extra-precision things for the highest term of the power series only.
- old BSD libm uses the extra-precision functions __exp__D() and
__log__D().  These have a chance of providing an extra 25 bits of
precision.  It is much easier to understand, so I will include parts
of it here:

%  * Required kernel functions:
%  *	exp__D(a,c)			exp(a + c) for |a| << |c|
%  *	struct d_double dlog(x)		r.a + r.b, |r.b| < |r.a|
% ...
% double pow(x,y)
% double x,y;
% {
% 	double t;
% 	if (y==zero)
% 		return (one);
% 	else if (y==one || (_IEEE && x != x))
% 		return (x);		/* if x is NaN or y=1 */
% 	else if (_IEEE && y!=y)		/* if y is NaN */
% 		return (y);
% 	else if (!finite(y))		/* if y is INF */
% 		if ((t=fabs(x))==one)	/* +-1 ** +-INF is NaN */
% 			return (y - y);
% 		else if (t>one)
% 			return ((y<0)? zero : ((x<zero)? y-y : y));
% 		else
% 			return ((y>0)? zero : ((x<0)? y-y : -y));
% 	else if (y==two)
% 		return (x*x);
% 	else if (y==negone)
% 		return (one/x);
%     /* x > 0, x == +0 */
% 	else if (copysign(one, x) == one)
% 		return (pow_P(x, y));
%
%     /* sign(x)= -1 */
% 	/* if y is an even integer */
% 	else if ( (t=drem(y,two)) == zero)
% 		return (pow_P(-x, y));
%
% 	/* if y is an odd integer */
% 	else if (copysign(t,one) == one)
% 		return (-pow_P(-x, y));

Apart from exceptional cases, it only has these special cases for integers.
This is not much different from fdlibm pow().

%
% 	/* Henceforth y is not an integer */
% 	else if (x==zero)	/* x is -0 */
% 		return ((y>zero)? -x : one/(-x));
% 	else if (_IEEE)
% 		return (zero/zero);
% 	else
% 		return (infnan(EDOM));
% }
% /* kernel function for x >= 0 */
% static double
% #ifdef _ANSI_SOURCE
% pow_P(double x, double y)
% #else
% pow_P(x, y) double x, y;
% #endif
% {
% 	struct Double s, t, __log__D();
% 	double  __exp__D(), huge = 1e300, tiny = 1e-300;
%
% 	if (x == zero)
% 		if (y > zero)
% 			return (zero);
% 		else if (_IEEE)
% 			return (huge*huge);
% 		else
% 			return (infnan(ERANGE));
% 	if (x == one)
% 		return (one);
% 	if (!finite(x))
% 		if (y < zero)
% 			return (zero);
% 		else if (_IEEE)
% 			return (huge*huge);
% 		else
% 			return (infnan(ERANGE));
% 	if (y >= 7e18)		/* infinity */
% 		if (x < 1)
% 			return(tiny*tiny);
% 		else if (_IEEE)
% 			return (huge*huge);
% 		else
% 			return (infnan(ERANGE));

Don't know why it handles some exceptional cases here instead of all
in the main function.

%
% 	/* Return exp(y*log(x)), using simulated extended */
% 	/* precision for the log and the multiply.	  */
%
% 	s = __log__D(x);
% 	t.a = y;
% 	TRUNC(t.a);
% 	t.b = y - t.a;
% 	t.b = s.b*y + t.b*s.a;
% 	t.a *= s.a;
% 	s.a = t.a + t.b;
% 	s.b = (t.a - s.a) + t.b;
% 	return (__exp__D(s.a, s.b));
% }

It handles all non-special cases here, using the formula with extra
precision.

Note that __log__D() starts with standard precision and produces extra
precision, while __exp__D() starts with extra precision and produces
standard precision.  This is exactly what is needed here.  These functions
are still in libm, since I brought them back to use with old BSD tgamma().
But they are not very good.  My log*() works similarly to __log__D().  It
currently only provides a static inline interface for exporting the extra
precision that it produces.  It produces only about 8 extra bits.
__log__D() has a chance of producing 26 extra, but I expect it barely
produces 8.  It is fairly easy to change tables and polynomial degrees
to produce 26 if required.  Current expl() has similar extra precision
internally to __log__D(), but it produces it instead of starting with
it, so it would be not so easy to convert it to __exp__D()'s interface.
However, some versions of exp2*() are a good base for such a conversion,
since exp2(x) == pow(2, x) so it has similar requirements.  It would
be generally useful to have versions of exp() and log() that both take
and return extra precision.  My log1pl() almost does this internally.

> - Since both sin() and cos() have a range of [-1,1], rrad can potentially
>  exceed DBL_MAX even though the return value is finite.

This is a solved problem.  See how cexp() and ccosh() handle it.

> ...
> Moving back a line, cabs(z) can avoid spurious overflow & provide a
> split result via:
>
> struct split {
> 	int n;
> 	double f;
> } cabs(double complex z)
> {
> 	double rz, iz;
> 	int nrz, niz;
> 	double frz, fiz;
> 	double y;
> 	int ny;
>
> 	rz = creal(z);
> 	nrz = ilogb(rz);
> 	frz = scalbn(rz, -nrz);
> 	iz = cimag(z);
> 	niz = ilogb(iz);
> 	/*
> 	 * sqrt(a*a + b*b) = abs(a) * sqrt(1 + (b/a)**2)
> 	 * For x << 1, sqrt(1 + x*x) is approximately (1 + x*x/2)
> 	 * therefore the "sqrt(1 + (b/a)**2)" term can be ignored
> 	 * if b is more than half the size of the mantissa smaller
> 	 * than a.
> 	 */
> 	if (nrz > niz + DBL_MANT_DIG/2)
> 		return ({nrz, fabs(frz)});
> 	if (niz > nrz + DBL_MANT_DIG/2)
> 		return ({niz, fabs(scalbn(iz, -niz))});
>
> 	fiz = scalbn(iz, -nrz);
> 	/* No underflow/overflow is possible due to restricted range */
> 	y = sqrt(frz*frz + fiz*fiz);
> 	ny = ilogb(y);
> 	return ({nrz + ny, scalbn(y, -ny)});
> }

Here you are reinventing a bit too much of hypot(), or rather clog().
Scaling is certainly necessary to avoid spurious overflow, so we
can't just use these functions.  There are also large problems with
accuracy, which the above doesn't solve and clog() probably doesn't
solve well enough for use here.  I snipped too much above so I quote
the pseudo-code again:

> double complex cpow(double complex z, double complex w)
> {
> 	double rw, iw;
>
> 	rw = creal(w);
> 	iw = cimag(w);

log(rad) cancels up to ~156 bits (3 * DBL_MAX less 3) when |z| is near 1,
so this won't work without tripled or quadrupled double precision.  clog()
has to work hard to produce 52 bits of precision using only doubled
double precision.

> 	th = carg(z);
>

There will be additional cancelations here.  It seems to hard to avoid
them completely, but it would be good to start with at least full precision
in lrad and th.  I know how to get full precision for lrad: in clog(),
where it evaluates log1p(hi + lo) in the critical case, use an
extra-precision log1p() on (hi, lo).  This would also give an extra-
precision result if hi+lo has extra precision.  The doubled double
precision caclulations usually give lots of extra precision in hi+lo,
but I forget if they give much extra when |z| is near 1.

> 	rth = rw * th + iw * lrad;

Can cancel too.

>
> }

No cancelations here.  This is "easy", since it is just like the solved
problem for ccosh().  We just have to scale the exp() factor for both.

> A potential enhancement would be to merge log(cabs(z)), noting that
>  log(sqrt(x)) == log(x)/2
> and
>  log(sqrt(x*x + y*y)) == log(abs(x) * sqrt(1 + (y*y)/(x*x)))
>                       == log(abs(x)) + log(1 + (y*y)/(x*x))/2
>                       == log(abs(x)) + log1p((y*y)/(x*x))/2
> (Note that the use of the log1p() approach would imply an additional
> lrad2 term that is not handled in the following)

clog() does stuff like this, especially in discarded versions.  |y/x|
must be fairly small, else this just gives an extra ulp or 2 of error.
If |y/x| is very small, then the above can be evaluated efficiently
using log1p(ratio) ~= ratio.  I tried this, but it didn't improve the
average speed, although the test emphasized unusual cases with small
or large ratios.  So I only use this when |y/x| is so small that it
would underflow, so the log1p() term goes away as a side effect of
avoiding underflow.

cpow() could only do better than clog() by inlining clog() and adding
complications.  Seems too hard.

I now see that the overflow avoidance in clog() is enough for calculating
lrad for use here, but then the expressions for rrad and rth may overflow.
These expressions just give ordinary complex multiplication, which may
overflow, and one reason we are writing them as real expressions is
to possibly avoid overflow.  It can be avoided by scaling |lrad| and
|th| to <= 0.5, with complications when one of them would underflow.

> Looking at "rth = rw * th + iw * lrad;": This is used solely as an
> argument for sin() or cos() - both of which have a period of 2Ï€,
> therefore rth can be evaluated modulo 2Ï€, as can each term in the
> performs a high-precision modulo 2Ï€:
>  rth = mod_2pi(rw) * th + mod_2pi(iw) * (mod_2pi(lrad0) + mod_2pi(lrad1));
> This should minimise the catastrophic cancellation.

I think Stephen said that this only works for special args.

The mod_2pi operation is difficult, but is supported by
__kernel_rem_pio2().  But this point shows that the cancelation problem
is larger than usual -- it is not just at 0, but at all multiples of
2*Pi (or Pi/2).  But I think it is much larger at 0, since other
multiples are hard to get close to.  Easier with 2 variables than 1
though.

> This leaves "rrad = exp(rw * lrad - iw * th);" or, given the lrad split:
>  x = rw * log(2) * nrad + rw * lrad1 - iw * th;

I snipped the part about the rad split.  I think it wouldn't work too well.
You would have to evaluate the above long expression for x mostly in
exitra precision.  It's easier to have an extra-precision log().  In
the above, the full extra precision expression is:

iw * (th_hi + th_lo);	/* not literal; also need hi+lo for rw... */

We can let the extra-precision log() do some of the additions, and "only"
have to evaluate:

x = rw * (lrad_hi + lrad_lo) - iw * (th_hi + th_lo);	/* not literal... */

> The ranges of the various terms are:
>  rw, iw:  Any double
> [... unreadable part with binary characters]
> ...
> In order to achieve the required range/precision, x needs to be expressed as:
>  x = k * log(2) + r,  |r| â‰¤ log(2) (ideally â‰¤ log(2)/2)
> Giving
>  rrad = 2**k * exp(r)
> or
>  double t = exp(r);
>  return (cpack(scalbn(t * cos(rth), k), scalbn(t * sin(rth), k)));
>
> Unfortunately, it's not immediately obvious to me how to get from
>  x = rw * log(2) * nrad + rw * lrad1 - iw * th;
> to
>  x = k * log(2) + r,  |r| â‰¤ log(2) (ideally â‰¤ log(2)/2)
> with the range/precision requirements.  In particular, all
> multiplications need to be able to return values exceeding DBL_MAX and
> have sufficient precision to provide a 53-bit final result in the face
> of catastrophic cancellation.

See cexp().  It calls __ldexp_cexp() to handle all the scaling details.
Oops, __ldexp_cexp() cannot handle extra precision that we may have in x,
we need significant extra precision in x since exp() will expand the error.
So we need an __ldexp_cexp() corresponding to __exp__D().  But that is easier
than getting enough precision in x.  I would try evaluating it all in merely
doubled double precision.  clog() shows how to do this for the simpler
expression ax*ax + ay*ay - 1.

Bruce
```