progress on powl.

Sat Dec 16 04:09:02 UTC 2017

On Fri, 15 Dec 2017, Steve Kargl wrote:

> As expect no one elase is working on powl, I have
> dusted off my work-in-progress that is now some
> 1+ year in the works. I've now got a sort of
> working version in very limited testing.
> ...
> I'm still slowing working out some thresholds and
> few other kinks.  The biggest problem is that
> src/e_pow.c may be the worse commented code that
> I've had the pleasure to read.  There are 3 polynomial
> approximations.  One is easy to work.  The other two
> are gaint mysteries at moment.

It seems OK to me.  I only see 2 polys and a simple rational function.

One for log() is obviously needed, and the special one used is
documented.  P* for exp() is not documented in e_pow.c and is not
obvious, but is identical to P*() for exp() in e_exp.c.  exp()
converges too slowly so is reduced to a special rational function
related to (exp(x)+1)/(exp(x)-1) which is related to Bernoulli numbers
and thus to lots of interesting power series.  The special rational
function is basically just this rational function in exp(x) inverted.

It is suprising that P* is accurate enough for pow().  exp()
exponentiates errors.  It is apparently enough to calculate y*log2(x)
= n + y' accurately.  Since |y'| <= 0.5, the errors don't get
exponentiated.  But then we have to evaluate exp(y'*log2) accurately,
and I'm surprised if the naive method in the comment is enough.  We
should evaluate y' in extra precision and used this for calculating
y'*log2, and I think we do.  Then ignoring the error from this is
apparently OK.  It should be < 0.5 ulps, and then it is easy to avoid
adding more than 1 ulp of error for exp() but not so easy to avoid
adding more than 0.5 ulps of error.  However, if we just use exp(x) =
1 + x + 2*x/2 + ..., then this is easy enough.  We have x in extra
precision and should add it to 1 in extra precision, and the other
terms are too small to contribute much to the error.  Different care
is needed for the rational function.

My clog*(z) has similar problems for the real part.  At the end, it must
calculate log*(x) where x has extra precision.  Unfortunately, only
the internal kernels of logl() and of my uncommitted version of log()
and logf() support extra precision.  Since I don't want to bloat clog()
by open-coding these internals like pow() does for exp()'s internals,
I just call the log*() functions.  This blows out the error to up to ~1.8
ulps.

See pow.c in 4.4BSD libm for a more brute force approach that is probably
better.  Part are the same, down to misspelling "multi" as "muti".  Maybe
a watermark.  We still have extra-precision exp and log functions which
were used for this and which w use for tgamma.  Unfortunately, these are
only extra for double precision.  This prevents a simple translation of
tgamma() too.  The kernels exp__D.c and log__L.c are actually simple,
especially the latter, so this is not a good excuse for not translating
them.

BTW, you (er, I) have not got around to committing my double and float
versions of coshl(), sinhl() and tanhl().  But expl() already has your
kernel in a non-external place.  The extra-precision kernels should
work better than the open-coded ones in powl().  I now remember that
even the double and float internal ones have a couple of bits of extra
precision, while your expl() one has 8-10.  I was going to translate
the expl() kernel to lower precision, but found that the old internal
ones are good enough in both accuracy and efficency for the hyperbolic
functions.  So my hyperbolic functions in double and float precision
are almost identical with the ones in long double precision, except
for details hidden in the kernels.  I only cleaned up the double and
float kernels a little when reorganizing to separate them.  However,
I couldn't get the kernels to work as fast as open coding for exp*()
itself.  clang now optimizes the old exp*() _very_ well so it shouldn't
have any problems turning the kernels into the same code as the open-
coded versions.

The old exp() kernels only have a couple of bits of extra precision,
but this is enough.  i386 now uses the i387 for exp() but not for
expf().  This is slow since it has to switch precisions.  This is
not very accurate even with 11 extra bits of precision, since the
method is sloppy giving and loses about 8 of the extras.  This is not
significantly better than the 2 extra bits given by the kernels.
Except possibly for exp().  If you have an error of 0.76 ulps than
another 0.25 ulps takes the final error above 1.00.  The expl() kernel
shouldn't have this problem since its error is smaller than the exp()
kernel or any adjustments that pow() could reasonably make to its
open-coded version of this.

The float coeffs for P* should be easy to fix since we already did that
if necessary for their copies in expf().

Bruce