Implementing cpow(3)
Peter Jeremy
peter at rulingia.com
Fri Oct 5 21:32:28 UTC 2012
I have been meditating on cpow(3) for some time without achieving
enlightenment. The following is somewhat of a braindump in the
hope that someone will advise whether I am on the right track or
have gone off the rails.
A naive solution to cpow(3) looks like:
double complex cpow(double complex z, double complex w)
{
return (cexp(w * clog(z));
}
but this suffers from both spurious exceptions and precision loss. A
somewhat expanded approach makes this more obvious:
double complex cpow(double complex z, double complex w)
{
double rw, iw;
double rad, lrad, th;
double rrad, rth;
rw = creal(w);
iw = cimag(w);
rad = cabs(z);
lrad = log(rad);
th = carg(z);
rrad = exp(rw * lrad - iw * th);
rth = rw * th + iw * lrad;
return (cpack(rrad * cos(rth), rrad * sin(rth)));
}
This approach has the advantage of not requiring any complex
arithmetic and it's also easier to explicitly handle exception cases.
Working through the spurious exceptions and precision loss issues:
- cabs(z) can overflow when z has large but finite real & imaginary parts.
- log(x) pushes exponent bits into the fraction, thus a double precision
log(x) loses about 10 bits of precision.
- The '-' & '+' expressions can both suffer catastrophic cancellation.
- Because exp(x) pushes fraction bits into the exponent, achieving 53 bits
of precision in the result requires about 63 bits of input fraction.
- Since both sin() and cos() have a range of [-1,1], rrad can potentially
exceed DBL_MAX even though the return value is finite.
Note that from here on, the code is intended to demonstrate the
general approach, rather than as committable code, and assumes that 0,
Inf and NaN are special cased externally. Note that denormals need
special handling to do ilogb() and scalbn() via bit-manipulation.
The precision loss in log(rad) can be avoided by separately handling the
exponent bits - ie split rad into int nrad and double frad such that:
rad = 2**nrad * frad, 1 <= frad < 2
Then:
lrad = log(rad)
= log(2**nrad * frad)
= log(2) * nrad + log(frad)
The limited range of frad means that log(frad) should not lose any
precision and the two sides of the addition can be stored separately.
For later use, assume:
double lrad0 = log(2.) * (double)nrad;
double lrad1 = log(frad);
Note that for x << 1, log(1 + x) =~ x. Therefore lrad1 is either 0 or
1p-53 ≤ lrad1 < log(2) ~= 0.693.
Moving back a line, cabs(z) can avoid spurious overflow & provide a
split result via:
struct split {
int n;
double f;
} cabs(double complex z)
{
double rz, iz;
int nrz, niz;
double frz, fiz;
double y;
int ny;
rz = creal(z);
nrz = ilogb(rz);
frz = scalbn(rz, -nrz);
iz = cimag(z);
niz = ilogb(iz);
/*
* sqrt(a*a + b*b) = abs(a) * sqrt(1 + (b/a)**2)
* For x << 1, sqrt(1 + x*x) is approximately (1 + x*x/2)
* therefore the "sqrt(1 + (b/a)**2)" term can be ignored
* if b is more than half the size of the mantissa smaller
* than a.
*/
if (nrz > niz + DBL_MANT_DIG/2)
return ({nrz, fabs(frz)});
if (niz > nrz + DBL_MANT_DIG/2)
return ({niz, fabs(scalbn(iz, -niz))});
fiz = scalbn(iz, -nrz);
/* No underflow/overflow is possible due to restricted range */
y = sqrt(frz*frz + fiz*fiz);
ny = ilogb(y);
return ({nrz + ny, scalbn(y, -ny)});
}
A potential enhancement would be to merge log(cabs(z)), noting that
log(sqrt(x)) == log(x)/2
and
log(sqrt(x*x + y*y)) == log(abs(x) * sqrt(1 + (y*y)/(x*x)))
== log(abs(x)) + log(1 + (y*y)/(x*x))/2
== log(abs(x)) + log1p((y*y)/(x*x))/2
(Note that the use of the log1p() approach would imply an additional
lrad2 term that is not handled in the following)
Looking at "rth = rw * th + iw * lrad;": This is used solely as an
argument for sin() or cos() - both of which have a period of 2π,
therefore rth can be evaluated modulo 2π, as can each term in the
expression. Assuming lrad is split as (lrad0 + lrad1) and mod_2pi()
performs a high-precision modulo 2π:
rth = mod_2pi(rw) * th + mod_2pi(iw) * (mod_2pi(lrad0) + mod_2pi(lrad1));
This should minimise the catastrophic cancellation.
This leaves "rrad = exp(rw * lrad - iw * th);" or, given the lrad split:
x = rw * log(2) * nrad + rw * lrad1 - iw * th;
rrad = exp(x);
The ranges of the various terms are:
rw, iw: Any double
-π ≤ th ≤ π
-1074 ≤ nrad ≤ 1024 (allowing for denormals & z = DBL_MAX * (1 + i))
lrad1 = 0 or 1p-53 ≤ lrad1 < log(2) ~= 0.693.
Since the final result is
return (rrad * cis(rth));
and the range of cis(t) is
|cis(t)| ≤ 1
or, alternatively ignoring 0 values:
-1074 ≤ log₂(|cis(t)|) ≤ 0
this implies rrad needs to have a range
-1074 ≤ log₂(|rrad|) ≤ 2097
(the latter value allows for rrad * DBL_MIN = DBL_MAX)
and provide 53 bits of precision over
-1023 < log₂(|rrad|) < 2047
In order to achieve the required range/precision, x needs to be expressed as:
x = k * log(2) + r, |r| ≤ log(2) (ideally ≤ log(2)/2)
Giving
rrad = 2**k * exp(r)
or
double t = exp(r);
return (cpack(scalbn(t * cos(rth), k), scalbn(t * sin(rth), k)));
Unfortunately, it's not immediately obvious to me how to get from
x = rw * log(2) * nrad + rw * lrad1 - iw * th;
to
x = k * log(2) + r, |r| ≤ log(2) (ideally ≤ log(2)/2)
with the range/precision requirements. In particular, all
multiplications need to be able to return values exceeding DBL_MAX and
have sufficient precision to provide a 53-bit final result in the face
of catastrophic cancellation.
Does anyone have any suggestions as to how to proceed?
--
Peter Jeremy
More information about the freebsd-numerics
mailing list