erff, erfl, and erfcl patch

Tue Dec 3 00:57:01 UTC 2013

On Mon, 2 Dec 2013, Steve Kargl wrote:

> I intend to commit the patch that follows by sig in the next day or
> two.  In particular, I want to get the ld80 and ld128 versions of
> erfl and erfcl into tree as the current hack provided by imprecise.c
> is ugly.

Looks good, and a lot of work.  I don' have time to look at it closely
for a week or 2.

Did you find translating the fdlibm code, including keeping its style
bugs and algorithms to a fault, easier than using a new method like
you did for expl()?

> ...
> * msun/src/s_erf.c:
>  . Fix whitespace issues.
>  . Add weak references for erfl and erfcl on targets where long double
>    has 53 bits of precisions.
>
> * msun/src/s_erff.c:
>  . Consistently use lower case in hex constants.
>  . Update the rational approximations.
>  . Fix descriptions of the the domain and range.
>  . Remove leading zeros in exponents, e.g., 1.28379166e-01 becomes
>    1.28379166e-1.

Some style is still inconsistent.  Better not change it without making
it fully consistent.

> Index: ld128/s_erfl.c
> ===================================================================
> --- ld128/s_erfl.c	(revision 0)
> +++ ld128/s_erfl.c	(working copy)
> @@ -0,0 +1,347 @@
> ...
> +/*
> + * Domain [0, 0.84375], range ~[-2.076e-38, 2.074e-38]:
> + * |(erf(x) - x)/x - p(x)/q(x)| < 2**-126
> + */
> +static const long double
> +efx =  1.28379167095512573896158903121545167e-1L,  /* 0xecbff6a7, 0x481dd788, 0xb64d21a8, 0xeb06fc3f */
> +efx8=  1.02703333676410059116927122497236133L,     /* 0xecbff6a7, 0x481dd788, 0xb64d21a8, 0xeb06ff3f */
> +pp0 =  1.28379167095512573896158903121545167e-1L,  /* 0x3ffc06eb, 0xa8214db6, 0x88d71d48, 0xa7f6bfec */

There are really 2 subranges, one near 0 covered by a linear approximation
using efx and efx8, and the expanded comment doesn't really apply to this
range (though the comment might be true, p(x)/q(x) is not actually used for
this subrange).  Elsewhere, we often write similar comments, but we don't
put the coefficients for the approximation near 0 under the scope of the
comment.  This approximation near 0 is especially interesting so a separate
comment for it would be good.

> +pp1 = -3.14930453779220199897729762882897733e-1L,  /* 0xbffd427d, 0x20fdfc19, 0x5395ba59, 0x26b231dc */
> ...
> +qa12= -1.02073578608782110388686097137679971e-6L;  /* 0xbfeb1200, 0x6dd9de6c, 0x647bcf5b, 0xec577463 */
> ...

s_erf.c uses more spaces, so that there is always at least 1 space before
'=' (except for efx8, which is misformatted).  In fact, there are always
at least 2.  There, the coeff numbers don't go above 9, so there is more
space than necessary (except for lining up with the efx8 line if that were
not misformatted).  Here, using the same column for '=' as there is just
enough to leave 1 space after the wider coeffs.

> +long double
> +erfl(long double x)
> +{
> +	int32_t i;
> +	long double ax,R,S,P,Q,s,y,z,r;
> +	uint64_t lx, llx;
> +	uint16_t hx;
> +
> +	EXTRACT_LDBL128_WORDS(hx, lx, llx, x);
> +
> +	/* erf(+-0) = +-0 */
> +	if ((hx == 0 || hx == 0x8000) && (lx | llx) == 0)
> +	    return (x);

Why a special case for +-0?  s_erf.c doesn't have one.  I think it is correct
too.  erf(-0) is evaluated as 0.125*(8*-0+efx8*-0) = 0.125*(-0+-0) = -0,
hopefully in all rounding modes.

> +
> +	/* x is INF or NaN */
> +	if (hx >= 0x7fff) {
> +	    if (hx == 0xffff)
> +		return (-1);
> +	    if ((lx | llx) == 0)
> +		return (1);
> +	    return (x * x);
> +	}

s_erf.c has better comments.  x is _not_ INF or NaN before it is classified
as such...

This seems to be quite broken.  hx is unsigned, so (hx >= 0x7fff) is true
for all negative numbers, as well as for Infs and NaNs.  This gives the
bug that erfl(negative) = square(negative).  (hx == 0xffff) is true for
negative NaNs as well as for -Inf.  This gives the bug that
erfl(negative NaN) = -1.

s_erf.c avoids these bugs using the following methods:
- first mask out the sign bit: (ix = hx & MASK).
- be more careful testing for NaNs.  In the above, the (lx | llx) test
   would probably work if if it were not broken by being after the
   (hx == 0xffff) test; the latter is apparently to detect -Inf in a
   not so good way after not masking out the sign bit initially.
Then it uses a more magic expression than x*x so as to avoid the 3 cases
for +-1 and x*x.

In expl.c, we eventually found an even more magic expression that
avoids the (lx | llx) test.  This test tends to be slow (although it
is in a rarely executed case, just looking at llx sometimes confuses
the compiler to load llx for all cases).  The test is even more
complicated for ld80, since you have to handle the normalization bit
and should detected unsupported formats.  So it is good to avoid testing
this in bits.

For the magic expression, the one in s_erf.c is already good.  We don't
need to look at either lx or llx.  hx already consists of just the
exponent (and sign) bits.  This corresponds to masking off the mantissa
bits in s_erf.c.  (hx & 0x7fff) == 0x7fff classifies Infs and NaNs.
Divide 1 by x to turn +-Inf into +-0 and not change NaNs except to
quieten them.  This result is not used for +-Inf, but it keeps NaNs
unchanged except for quietening.  Then add the integer +-1 to get the
result +-1 for +-Inf and keep NaNs again unchanged except for
quietening.  s_erf.c calculates the +-1 as an integer.  I don't know
of any better way to turn +-Inf into +-1.  Expressions like x/Inf
don't quite work.

Similarly elsewhere.

> +
> +	ax = fabsl(x);
> +	if(ax < 0.84375) {

I agree with not using the bit tests here, at least if you can avoid
loading lx and llx at all.  (The way the above is written, lx and llx
are always loaded and there is no fast path for the usual case of finite
x where they are not used.)

> +	    if(ax < 0x1.p-34L) {

I like to omit the "point" if possible, and there is no need to use an
L suffix for small powers of 2.

> +	if (ax < LIM1) {

Not consistently misformatted as 'if(foo)'.

Why #defined constants for this and not for the others?  This one varies
between ld80 and ld128, so using it makes the diffs between those easier
to read, but it is a literal constant for f32 and d64, so using it makes
the diffs from ld* to the reference d64 harder to read.

> +	    s = one/(ax*ax);
> +	    if(ax < LIM0) {	/* |x| < 2.85715 */

Repeating magic numbers in comments defeats the point of using #define to
make them less magic.  LIM0 seems to be the same for all precisions, and
s_erf.c spells it less magically as 1/0.35.

> Index: ld80/s_erfl.c
> ===================================================================
> --- ld80/s_erfl.c	(revision 0)
> +++ ld80/s_erfl.c	(working copy)
> @@ -0,0 +1,353 @@
> ...
> +long double
> +erfl(long double x)
> ...
> +	/* x is INF or NaN */
> +	if (hx >= 0x7fff) {
> +		if (hx == 0xffff)
> +			return (-1);
> +		if (lx == 0x8000000000000000)
> +			return (1);
> +		return (x * x);
> +	}

After changing the first hx test to ((hx & 0x7fff) == 0x7fff) and removing
the second hx test as above, this seems to be a correct but slow way to
classify Infs and NaNs.  Oops, we still need a second hx test to classify
-Inf.  This must be done after testing the normalization bit:

 	/* Wrong comment intentionally left out. */
 	if ((hx & 0x7fff) == 0x7fff) {
 		if (lx == 0x8000000000000000ULL)
 			return (hx & 0x8000 ? -1 : 1);	/* erf(+-Inf) = +-1 */
 		return (x + x);		/* erf(NaN/pseudo{Inf,NaN}) = qNaN */
 	}

I don't like the ULL suffix, but it is needed to fix warnings for
non-C99 compilers, and I used it in s_logl.c.  The above may be slow,
but it is close to what I used in s_logl.c.  There it is impossible
to avoid loading lx.  s_logl.c is a little more complicated, both to
handle denormals and to classify unnormals as NaNs so that they get
handled by the x+x return.

It is better to avoid these complications by using the method in s_erf.c.

"Finite" unnormals are not classified by the hx test.  I think they are
hanled correctly by falling though, as in expl(), since there will by
an operation involving them.

> Index: src/s_erf.c
> ===================================================================
> --- src/s_erf.c	(revision 258855)
> +++ src/s_erf.c	(working copy)
> ...
> @@ -121,8 +121,8 @@
> /*
>  * Coefficients for approximation to  erf on [0,0.84375]
>  */
> -efx =  1.28379167095512586316e-01, /* 0x3FC06EBA, 0x8214DB69 */
> -efx8=  1.02703333676410069053e+00, /* 0x3FF06EBA, 0x8214DB69 */
> +efx =  1.28379167095512586316e-01,  /* 0x3FC06EBA, 0x8214DB69 */
> +efx8=  1.02703333676410069053e+00,  /* 0x3FF06EBA, 0x8214DB69 */
> pp0  =  1.28379167095512558561e-01, /* 0x3FC06EBA, 0x8214DB68 */
> pp1  = -3.25042107247001499370e-01, /* 0xBFD4CD7D, 0x691CB913 */
> pp2  = -2.84817495755985104766e-02, /* 0xBF9D2A51, 0xDBD7194F */

This lines up the comments, but the misformatting was not lining up either
the '=' or the first digit.

> Index: src/s_erff.c
> ===================================================================
> --- src/s_erff.c	(revision 258855)
> +++ src/s_erff.c	(working copy)
> @@ -21,62 +21,54 @@
>
> static const float
> tiny	    = 1e-30,
> -half=  5.0000000000e-01, /* 0x3F000000 */
> -one =  1.0000000000e+00, /* 0x3F800000 */
> -two =  2.0000000000e+00, /* 0x40000000 */
> +half=  5.0000000000e-1, /* 0x3f000000 */
> +one =  1.0000000000e+0, /* 0x3f800000 */
> +two =  2.0000000000e+0, /* 0x40000000 */

If changing these, fix the '=' and first digit to line up with later sections.

> /*
> - * Coefficients for approximation to erf on [0,0.84375]
> + * Domain [0, 0.84375], range ~[-5.4419e-10,5.5179e-10]:
> + * |(erf(x) - x)/x - p(x)/q(x)| < 2**-31
>  */
> -efx =  1.2837916613e-01, /* 0x3e0375d4 */
> -efx8=  1.0270333290e+00, /* 0x3f8375d4 */
> +efx  =  1.28379166e-1, /* 0x3e0375d4 */
> +efx8 =  1.02703333,    /* 0x3f8375d4 */
> +pp0  =  1.28379166e-1, /* 0x3e0375d4 */

This does line up the '=' and first digit.  So the formatting seems
consistent in this file except for half/one/two, but may be too different
from s_erfc.c

> /*
> - * Domain [0.84375, 1.25], range ~[-1.953e-11,1.940e-11]:
> - * |(erf(x) - erx) - p(x)/q(x)| < 2**-36.
> + * Domain [1.25,1/0.35], range ~[-4.821e-9,4.927e-9]:
> + * |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-28
>  */
> -erx  =  8.42697144e-01F, /*  0x1.af7600p-1.  erf(1) rounded to 16 bits. */
> -pa0  =  3.64939137e-06F, /*  0x1.e9d022p-19 */
> -pa1  =  4.15109694e-01F, /*  0x1.a91284p-2 */
> -pa2  = -1.65179938e-01F, /* -0x1.5249dcp-3 */
> -pa3  =  1.10914491e-01F, /*  0x1.c64e46p-4 */
> -qa1  =  6.02074385e-01F, /*  0x1.344318p-1 */
> -qa2  =  5.35934687e-01F, /*  0x1.126608p-1 */
> -qa3  =  1.68576106e-01F, /*  0x1.593e6ep-3 */
> -qa4  =  5.62181212e-02F, /*  0x1.cc89f2p-5 */
> +ra0 = -9.88156721e-3, /* 0xbc21e64c */
> +ra1 = -5.43658376e-1, /* 0xbf0b2d32 */

Nah, this becomes inconsistent again by removing the space bnefore the '='
that is needed to like up with the longer variable name efx8 and a space
after that before the '='.

> @@ -113,12 +105,12 @@
> 	}
> 	x = fabsf(x);
>  	s = one/(x*x);
> -	if(ix< 0x4036DB6E) {	/* |x| < 1/0.35 */
> +	if(ix< 0x4036db8c) {	/* |x| < 2.85715 ~ 1/0.35 */

Don't give more detail than s_erf.c.  We are only keeping these comments
(that should be identical) because removing them would make the diffs
harder to read).

This also changes the style.  s_erf.c still uses an upper case hex constant
here.

Bruce