[PATCH] avoid function call overhead in tgammaf

Sun Feb 19 10:24:10 UTC 2017

On Sun, 19 Feb 2017, Bruce Evans wrote:

> On Sat, 18 Feb 2017, Steve Kargl wrote:
>
>> The following patch treats special values (i.e., +-inf, nan)
>> and values outside a limited domain to avoid the function
>> call overhead of tgamma.  Anyone with a commit bit is more
>> than welcomed to commit the patch (after of course a review).
>
> Since it is claimed to be "essentially useless", optimizing it doesn't
> seem useful.
>
>> Index: src/s_tgammaf.c
>> ===================================================================
>> --- src/s_tgammaf.c	(revision 1857)
>> +++ src/s_tgammaf.c	(working copy)
>> ...
>> +static u_int32_t overflow = 0x420c290f;		/*  35.0400981 */
>> +static u_int32_t underflow = 0x421a67d8;	/*  38.6014118 */
>
> This is a confusing name.  tgamma doesn't underflow like exp.  It overflows
> near negative integers, but below a threshold overflow can't happen because
> it is impossible to get near enough to an integer.  It also underflows for
> large negative args not near an integer.  The threshold here must be the
> minimum of the 2 thresholds.  Apparently, the one for underflow is always
> smaller (more negative) for float precision.  It is easy to check all cases
> for float precision to find the minimum.  Double precision uses a fuzzier
> threshold and spells it -190.  It uses the less fuzzy threshold -177.69 in
> a comment, except it misspells this as [plus] 177.69.

Testing shows that the threshold is too high (above the "underflow"
threshold).  Perhaps you forgot about gradual underflow.  The precise
value seems to be:

static u_int32_t underflow = 0x4224000b;	/*  -(-38.something) */

>
>> +static volatile float huge = 1.e30, tiny = 1.e-30;
>> +
>> float
>> tgammaf(float x)
>> {
>> +	u_int32_t hx;
>> +	int32_t ix, sg;
>> +
>> +	GET_FLOAT_WORD(hx, x);
>> +	ix = hx & 0x7fffffff;
>> +	sg = hx & 0x80000000;
>> +
>> +	if (ix > overflow) {
>> +		if (ix >= 0x7f800000)
>> +			return (sg ? x / x : x + x);
>> +		if (!sg)
>> +		 	return (huge * huge);
>> +	}
>> +
>> +	if (ix == 0)
>> +		return (1 / x);
>> +
>> +	if (sg && ix > underflow) {
>> +		/*
>> +		 * tgammaf(x) for integral x returns an NaN, so we implement
>> +		 * a poor man's rintf().
>> +		 */
>> +		volatile float vz;
>> +		float y,z;
>> +
>> +		y = -x;
>> +
>> +		vz = y + 0x1p23F;	/* depend on 0 <= y < 0x1p23 */
>> +		z = vz - 0x1p23F;	/* rintf(y) for the above range */
>> +		if (z == y)
>> +	    		return ((x - x) / (x - x));
>
> This has delicate optimizations which belong more in tgamma().  tgamma()
> just uses (x == ceil(x)) to classify negative integers.  tgamma() also
> has fewer/different style bugs.

The optimizations are quite broken.  x can be any finite non-integer
above ~38, so it doesn't satisfy y < 0x1p23.

I used the quick fix of copying the slow method from tgamma():

 		if (y == ceil(y))
 	    		return ((x - x) / (x - x));
>
>> +
>> +		return (tiny * tiny);
>
> The first error is immediately below the underflow threshold:
>
> X x =                           0xc21a67d9 -38.6014137
> X tgamma(x) = 0xb68ffffca9859b0c 0x80000000 -7.00648117e-46
> X tgammaf(x) =                  0          0 0
> X err = -0x800000000ffffe54 17179869184.50000
>
> This is just a sign error.  tgamma(x) is the small negative value
> ~ 7e-46.  This underflows to -0.0F, but tiny*tiny gives +0.0F.
>
> tgamma() does similar things plus extras to get the sign right.  The
> sign is -1 if ceil(x) is even for x below the underflow threshold.
> It is not so easy to determine the sign from the raw bits.

I used the quick fix of copying the slow method from tgamma():

 		z = ceil(y);
 		z = z / 2;
 		if (z == ceil(z))
 			return (tiny * tiny);
 		return (-tiny * tiny);

>
>> +	}
>> 
>> 	return (tgamma(x));
>> }

I tested this for all cases where hx >= 0xc2240000 in the default rounding
mode.  (Other modes aren't supposed to work, but they might, and ceil()
works right for them more clearly than the optimizations.)

Bruce