Sun Dec 8 18:33:46 UTC 2013

On Sun, Dec 08, 2013 at 03:21:04PM +1100, Bruce Evans wrote:
> On Sat, 7 Dec 2013, Steve Kargl wrote:
>
> > I really need to stop looking at fdlibm code.  The threshold(s) in
> > e_lgammaf.c could be chosen better (and actually documented).  In
> > the float code
> >
> >    /* 8.0 <= x < 2**58 */
> > 	} else if (ix < 0x4e800000) { // (ix < 0x5c800000) {
> > 	    t = __ieee754_logf(x);
> > 	    z = one/x;
> > 	    y = z*z;
> > 	    w = w0+z*(w1+y*(w2+y*(w3+y*(w4+y*(w5+y*w6)))));
> > 	    r = (x-half)*(t-one)+w;
> > 	} else
> >    /* 2**58 <= x <= inf */
> > 	    r =  x*(__ieee754_logf(x)-one);
> >
> > the 2**58 is much too large.  Asymptotically, we have
>
> That's Cygnus code, not fdlibm :-).
>
> > lgamma(x)~(x-0.5)log(x)-x = x*(log(x)-1) - 0.5*log(x)
> >
> > where the 0.5*log(x) is insignificant when compared to 2**58.
> > I suspect that 2**58 is a sloppy threshold for double, which
> > can be reduced to 2**30 (or smaller) for float.  I suppose I
> > could look for a tighter bound, but 2**(p+slop) with p, the
> > precision, probably is sufficient.
>
> The Cygnus translation to float didn't look closely enough at
> the thresholds.

It also did not recompute minimax coefficients as there are too
many terms in the approximation.  Looks like Cygnus simply rounded
the double coefficients to float.

> gamma() was the first part of libm that I looked closely at (even
> before trig functions in 2004/5).  I didn't understand it at first and
> tried to write my own.  My first idea was to use the asymptotic
> expansion as a kernel for all cases, especially for complex args.
> To use the asymptotic expansion, the real part of the arg must be
> pushed up to about 16 or greater using repeated multiplications.
> It is difficult to do these multiplications accurately enough,
> but the float case should be very easy using double precision
> multiplication.  I tried extra precision in software, but this
> was too slow.  This method also works poorly for args with negative
> real part, especially for lgamma() near its zeros.  Even using
> doubled double precision didn't get near 1 ulp of accuracy in float
> precision for lgamma() near its zeros, and the only fix that I
> know of is to use a special approximation near each zero down
> to about real part -70 (70 cases).  fdlibm is also bad for lgamma()
> near its zeros.

Interesting idea.  For x in [2,8), fdlibm code (as you probably
know) uses recursion to reduce the evaluation to a log of a
product and lgamma(2+s) with s in [0,1).  One can then show that
lgamma(2+s) = s * (1 - E) + s**2 * P(s) where E is Euler's constant
and P(s) is a polynomial in s.  But, instead of using a minimax
polynomial approximation fdlibm code uses a rational approximation
lgamma(2+s) = 0.5*s + p(s) / q(s).  This is the origin of the
s0, ..., s6, r1, ..., r6 coefficients.

> Here is a pari version of this (the C version is harder to understand
> due to its technical details and less suitable language; pari syntax
> is worse than C syntax for most things (including things that pari
> should be able to do well) but slightly better for this use:

I suppose one of these days, I'll learn how to use pari.
A quick scan of n1256.pdf shows that C99 does not require
lgamma of a complex argument, but it does reserve the
names clgamma[fl].

> % lg(z) =
> % {
> % 	local(bol, rnew);
> %
> % 	if (abs(z - 1) < 1.e-1, return (lngamma(z)););
> % 	if (abs(z - 2) < 1.e-1, return (lngamma(z)););
>
> Near 1 and 2, I planned to use a poly approximation, but that is not
> so easy to do in pari so I call the pari builtin lgamma() to fake it.

fdlibm's lgammaf_r does not appear to have an issue near 1 or 2.
Starting at the lower bound of each interval and using nextafterf
to scan up to the upper bound, I'm seeing

% make testf && ./testf
Interval:                       Time    ULP      Value
[4.7683716e-07, 2.0000000e+00): 0.04590 0.98491  1.231645e+00
[2.0000000e+00, 8.0000000e+00): 0.08754 0.82825  4.012439e+00
[8.0000000e+00, 2.8823038e+17): 0.05841 1.43789  8.942273e+06
[2.8823038e+17, 2.6584560e+36): 0.05095 0.50000  1.491249e+23

where the reference value is from lgamma_r.  The different
intervals test specific branches in lgammaf_r.  The upper
bound of 2.6584560e36 is 0x1p121.  Time is the average value
for all calls in the interval in usec/call.

> % 	bol = 0;
> % 	r = 1;
> % 	while (abs(z) < M,
> % 		rnew = r * z;
> % 		if (imag(r) > 0 && imag(rnew) < 0, bol += 1);
> % 		if (imag(r) < 0 && imag(rnew) > 0, bol -= 1);
> % 		r = rnew;
> % 		z = z + 1;
> % 	);
>
> This pushes up z, almost as above.
>
> % 	bol = 0;
> % 	return (lgb(z) - log(r) - 2 * Pi * bol);
>
> Then use the asymptotic expansion.
>
> % }
> %
> % lgb(z) = (z - 0.5) * log(z) - z + 0.5 * log(2 * Pi) + \
> %     sum(n = 1, N, b[n] / z^(2 * n - 1))
>
> This is the asymptotic expansion.  Pari syntax is only perfect for
> numeric things like this (and probably for some number theory things
> which I don't know well).   b[n] is a table of coefficients.
>
> I had forgotten the details of even the first term in this.  You
> mentioned using an approximation of just the first term or two above
> a huge threshold.  It might be better to go much lower using another
> term or two (but not as many as medium-sized args).

Well, I was only interested in the poorly chosen 2**58 threshold.
In the [8,2**58) interval the asymptotic expansion is used, but
its written in such a way that the

>%  + 0.5 * log(2 * Pi) +  sum(n = 1, N, b[n] / z^(2 * n - 1))

portion of your above code is approximated by a minimax polynomial.

/* 8.0 <= x < 2**58 */
} else if (ix < 0x5c800000) {
t = __ieee754_logf(x);
z = one/x;
y = z*z;
w = w0+z*(w1+y*(w2+y*(w3+y*(w4+y*(w5+y*w6)))));
r = (x-half)*(t-one)+w;
} else

fdlibm use only the first 2 terms for the x >= 2**58.

> Cephes has a very complicated lgamma() or gamma(), with many special
> poly approximations for 0 < x < 8 or similar.

fdlibm's approximations in the [0,2) range are mostly black magic
to me.

--
Steve