dead code in lgamma_r[f]?

Sun Dec 15 16:24:27 UTC 2013

On Mon, 9 Dec 2013, Steve Kargl wrote:

> On Tue, Dec 10, 2013 at 04:26:12PM +1100, Bruce Evans wrote:
>> On Mon, 9 Dec 2013, Steve Kargl wrote:
>>>  . In the domain [0,2), move the three minimax approximations embedded
>>>    within __ieee75_lgamma_r() into three 'static inline double' functions.
>>
>> This is too invasive.  It is only a style change to a slighly worse style.
>
> It's only a style change to a much better style in that I now
> have some idea of what to do get working ld80 and ld128 versions.

"slightly worse" != "much better" :-).  The disadvantages are that the
combinations of expressions are a little harder to see and more than a
little harder to debug (since debugging of inline functions is broken).

>> Like whitespace fixes.
>>
>>> 		r = -__ieee754_log(x);
>>> -		if(ix>=0x3FE76944) {y = one-x; i= 0;}
>>> -		else if(ix>=0x3FCDA661) {y= x-(tc-one); i=1;}
>>> -	  	else {y = x; i=2;}
>>> +		if (ix >= 0x3FE76944)
>>> +		    r += func0(1 - x);
>>> +		else if (ix >= 0x3FCDA661)
>>> +		    r += func1(x - (tc - 1));
>>> +	  	else
>>> +		    r += func2(x);
>>
>> The simplification from using func*() seems to be mainly avoiding a case
>> statement and the case selector variable i.
>
> It also allows one to clearly see that here func[0-2] are
> evaluations of lgamma(1+x) where subdomains in [0,2) are
> being mapped into other domains.

I missed that the [0-2] in the names are more than function numbers.
I still disagree with this change.

>>> ...
>>> 	    switch(i) {
>>> -	    case 7: z *= (y+6.0);	/* FALLTHRU */
>>> -	    case 6: z *= (y+5.0);	/* FALLTHRU */
>>> -	    case 5: z *= (y+4.0);	/* FALLTHRU */
>>> -	    case 4: z *= (y+3.0);	/* FALLTHRU */
>>> -	    case 3: z *= (y+2.0);	/* FALLTHRU */
>>> +	    case 7: z *= (y+6);	/* FALLTHRU */
>>> +	    case 6: z *= (y+5);	/* FALLTHRU */
>>> +	    case 5: z *= (y+4);	/* FALLTHRU */
>>> +	    case 4: z *= (y+3);	/* FALLTHRU */
>>> +	    case 3: z *= (y+2);	/* FALLTHRU */
>>
>> The comment indentation would be wrong now, except this file already uses
>> totally inconsistent comment indentation.  The float version blindly moved
>> the indentation in the other direction by adding float casts.
>
> The change reduces the diff between float and double versions.

That's the removal of the types in the constants.  The indentation shouldn't
be changed by blind substitution.

This block of code is bad for the float version anyway.  Testing another
version showed that the asymptotic formula gives enough precision for
floats (29 bits when evaluated with double coefficients and
insignificant rounding errors) starting at x = 4.   So 4 of the above 5
cases shouldn't have been there for float precision, and with only 1
case there shouldn't be a case statement.  Reducing the number of terms
in the asymptotic formula might eliminate the 1 case.

However, I couldn't get this to work (to remove 1 case), and you removed
4 terms in the asymptotic formula instead.  I think you couldn't get this
to work either :-) -- my calculations show that for gamma(4), this reduced
the accuracy of the formula from 29 bits to 18.  Not a problem, since the
threshold is still 8.  However, the accuracy at 8 is now only 24+ bits
(down from 49+).

Someone has an off-by-1 error -- 49+ bits is not enough for the
threshold of 8 which is used for double precision.  But a threshold
of 9 gives 57+ bits which is is good.

The fdlibm coefficients work much better than untuned Bernoulli
coefficients -- 4 or 5 more terms were needed with untuned Bernoulli
coeffs to get the same accuracy at x = 9.  I now see why your reduction
in the number of terms has little chance of working properly.  You
just removed the higher terms.  Cygnus already blindly rounded the
coefficients; that would de-tune them, and could easily make them worse
than untuned Bernoulli coefficients, and/or make the higher terms worse
than useless.  Thus removing the higher terms might have increased the
accuracy.  However, it has little chance of working as well as tuned
minimax coefficients.  As mentioned above, the fdlibm ones give a
surprising amount of accuracy.  It isn't clear that so much accuracy
is available in float precision without using relatively more terms.
My testing of this was sloppy -- I took the decimal coeffs and didn't
round them to double precision, and for w1 I used a high-precision
pari expression.

For accuracy, it seems best to use more terms in the asymptotic formula
so that it covers more cases without needing shifts up or down.  In
the asymptotic formula, the rounding error is dominated by the leading
terms but is quite large (several ulps).  The approximations for x
near 1 and 2 are more accurate, but the above loses accuracy in shifting
down.

My calculations involved trying to get my old shifting-up code to work
without explicit (slow) extra precision.  This showed that this is not
a good method for efficiency, even without the extra precision.  The
sloppy version of it runs barely faster than the less-sloppy fdlibm
version.  So the method is good for at most complex args, since I don't
know of another method for them (except args with negative real part
should start by using the reflection formula, since shifting up the
real part from large negative is too inefficient and inaccurate).

Bruce