progress on powl.

Sat Dec 16 06:15:12 UTC 2017

On Sat, 16 Dec 2017, Bruce Evans wrote:

> ...
> It is suprising that P* is accurate enough for pow().  exp()
> exponentiates errors.  It is apparently enough to calculate y*log2(x)
> = n + y' accurately.  Since |y'| <= 0.5, the errors don't get
> ...
>
> My clog*(z) has similar problems for the real part.  At the end, it must
> calculate log*(x) where x has extra precision.  Unfortunately, only
> the internal kernels of logl() and of my uncommitted version of log()
> and logf() support extra precision.  Since I don't want to bloat clog()
> by open-coding these internals like pow() does for exp()'s internals,
> I just call the log*() functions.  This blows out the error to up to ~1.8
> ulps.

> [... using extra precision of exp*() kernels]

Also, the internal extra precision of log*() might be enough for
y*log2(x) = n+y'.  It is enough for plain log2*(x).  logl() has 8-10
extra bits internally, so log2l() has 6-8.  Except for my versions,
log() and logf() barely have enough bits to lose any for log2() and
log2f().  Probably not enough for pow*().  y' also needs to retain
extra precision, but that is routine using the method in log2l() (it
just discards extra precision at the end).  The 4.4BSD pow() is almost
trivial for non-exceptional args using its kernels:

X 	/* Return exp(y*log(x)), using simulated extended */
X 	/* precision for the log and the multiply.	  */
X 
X 	s = log__D(x);
X 	t.a = y;
X 	TRUNC(t.a);
X 	t.b = y - t.a;
X 	t.b = s.b*y + t.b*s.a;
X 	t.a *= s.a;
X 	s.a = t.a + t.b;
X 	s.b = (t.a - s.a) + t.b;
X 	return (exp__D(s.a, s.b));

while the log2l() non-kernel with debugging and i386 macros removed is:

X long double
X log2l(long double x)
X {
X 	struct ld r;
X 	long double hi, lo;
X 
X 	k_logl(x, &r);
X 	if (!r.lo_set)
X 		return (r.hi);
X 	_2sumF(r.hi, r.lo);
X 	hi = (float)r.hi;
X 	lo = r.lo + (r.hi - hi);
X 	return (invln2_hi * hi +
X 	    (invln2_lo + invln2_hi) * lo + invln2_lo * hi);
X }

You should see _2sumF() open-coded in this pow().  _2sumF() is delicate
and the open-coding is probably broken.  Compiler bugs break most
implementations of 2sum*() if there is extra precision.  _2sumF() has
complications to work around this.  The complications are mostly in the
instructions in the comments for using it.  For double precision, the
the args must be double_t, not double.  The 4.4BSD library doesn't know
anything about this, so its open coding is just broken on i386.  It is
unclear if or how fdlibm pow() avoids these bugs.  We fixed a couple
for exp().  math_private.h has debugging variants of _2sumF() and variants
with less restrictions on use.

pow() also has to do a multiplication for this.  We don't have any macros
like _2sumF() to help for this.  In general, for real values errors for
extra-precision multiplications are unimportant since there are no
cancelations.  Howver a + b*c might need extra precision since there
might be cancelations for the addition.  Not a problem for the final step
of pow*(), but a problem for the y' = y*log2(x) -n step.

Another problem with exp__D() and log__L() is that their API is asymmetric.
log__L() returns extra precision but doesn't take it.  exp__D() takes
extra precision but doesn't return it.  Both as needed for pow() and
perhaps *gamma*(), but too specialized.  General versions were too much
slower in C90, but now inline functions can take and return extra precision
and hope that the compiler optimizes it away when the caller supplies it
as arg.lo = 0 and the callee discards it by not using result.lo.

Bruce