erff, erfl, and erfcl patch

Tue Dec 3 04:01:29 UTC 2013

On Tue, Dec 03, 2013 at 11:56:49AM +1100, Bruce Evans wrote:
> On Mon, 2 Dec 2013, Steve Kargl wrote:
> 
> > I intend to commit the patch that follows by sig in the next day or
> > two.  In particular, I want to get the ld80 and ld128 versions of
> > erfl and erfcl into tree as the current hack provided by imprecise.c
> > is ugly.
> 
> Looks good, and a lot of work.  I don' have time to look at it closely
> for a week or 2.

For not having time to look over the diff closely, you've
pointed out a few things I can fix/investigate.

> Did you find translating the fdlibm code, including keeping its style
> bugs and algorithms to a fault, easier than using a new method like
> you did for expl()?

Interesting question.  I have 2 more years of experience with fdlibm
code since I went looking for a different method for expl() (and logl()).
I find that I have to go through fdlibm code and add comments
and trace the code see what is happen.

I did look at replacing the rational approximations with simple
minimax polynomials over different domains.  minimax polynomials were
faster than the rational approximations, but I couldn't get the
max ULP below 1.4 for erff.

> > ...
> > * msun/src/s_erf.c:
> >  . Fix whitespace issues.
> >  . Add weak references for erfl and erfcl on targets where long double
> >    has 53 bits of precisions.
> >
> > * msun/src/s_erff.c:
> >  . Consistently use lower case in hex constants.
> >  . Update the rational approximations.
> >  . Fix descriptions of the the domain and range.
> >  . Remove leading zeros in exponents, e.g., 1.28379166e-01 becomes
> >    1.28379166e-1.
> 
> Some style is still inconsistent.  Better not change it without making
> it fully consistent.

I can leave s_erff.c for a later commit.  I'm much more interested
in getting ld80 and ld128 committed.

> > Index: ld128/s_erfl.c
> > ===================================================================
> > --- ld128/s_erfl.c	(revision 0)
> > +++ ld128/s_erfl.c	(working copy)
> > @@ -0,0 +1,347 @@
> > ...
> > +/*
> > + * Domain [0, 0.84375], range ~[-2.076e-38, 2.074e-38]:
> > + * |(erf(x) - x)/x - p(x)/q(x)| < 2**-126
> > + */
> > +static const long double
> > +efx =  1.28379167095512573896158903121545167e-1L,  /* 0xecbff6a7, 0x481dd788, 0xb64d21a8, 0xeb06fc3f */
> > +efx8=  1.02703333676410059116927122497236133L,     /* 0xecbff6a7, 0x481dd788, 0xb64d21a8, 0xeb06ff3f */
> > +pp0 =  1.28379167095512573896158903121545167e-1L,  /* 0x3ffc06eb, 0xa8214db6, 0x88d71d48, 0xa7f6bfec */
> 
> There are really 2 subranges, one near 0 covered by a linear approximation
> using efx and efx8, and the expanded comment doesn't really apply to this
> range (though the comment might be true, p(x)/q(x) is not actually used for
> this subrange).  Elsewhere, we often write similar comments, but we don't
> put the coefficients for the approximation near 0 under the scope of the
> comment.  This approximation near 0 is especially interesting so a separate
> comment for it would be good.

Yes, of course, you're right.  I'll expand the comment to note
the subrange is not included in the error estimate, and I'll
update the comment to use pp(x)/qq(x).

> > +pp1 = -3.14930453779220199897729762882897733e-1L,  /* 0xbffd427d, 0x20fdfc19, 0x5395ba59, 0x26b231dc */
> > ...
> > +qa12= -1.02073578608782110388686097137679971e-6L;  /* 0xbfeb1200, 0x6dd9de6c, 0x647bcf5b, 0xec577463 */
> > ...
> 
> s_erf.c uses more spaces, so that there is always at least 1 space before
> '=' (except for efx8, which is misformatted).  In fact, there are always
> at least 2.  There, the coeff numbers don't go above 9, so there is more
> space than necessary (except for lining up with the efx8 line if that were
> not misformatted).  Here, using the same column for '=' as there is just
> enough to leave 1 space after the wider coeffs.

OK. I'll align the '=' and ensure at least one space precedes it.

> 
> > +long double
> > +erfl(long double x)
> > +{
> > +	int32_t i;
> > +	long double ax,R,S,P,Q,s,y,z,r;
> > +	uint64_t lx, llx;
> > +	uint16_t hx;
> > +
> > +	EXTRACT_LDBL128_WORDS(hx, lx, llx, x);
> > +
> > +	/* erf(+-0) = +-0 */
> > +	if ((hx == 0 || hx == 0x8000) && (lx | llx) == 0)
> > +	    return (x);
> 
> Why a special case for +-0?  s_erf.c doesn't have one.  I think it is correct
> too.  erf(-0) is evaluated as 0.125*(8*-0+efx8*-0) = 0.125*(-0+-0) = -0,
> hopefully in all rounding modes.

I'll check again, but I recall that n1256.pdf says erf(+-0) = +-0
is exact.  Does falling through avoid raising any exception?

> > +
> > +	/* x is INF or NaN */
> > +	if (hx >= 0x7fff) {
> > +	    if (hx == 0xffff)
> > +		return (-1);
> > +	    if ((lx | llx) == 0)
> > +		return (1);
> > +	    return (x * x);
> > +	}
> 
> s_erf.c has better comments.  x is _not_ INF or NaN before it is classified
> as such...

The comment isn't for s_erf.c.  It for me. :-)  I can move
inside the 'if (hx >= 0x7fff)'.

> This seems to be quite broken.  hx is unsigned, so (hx >= 0x7fff) is true
> for all negative numbers, as well as for Infs and NaNs.  This gives the
> bug that erfl(negative) = square(negative).  (hx == 0xffff) is true for
> negative NaNs as well as for -Inf.  This gives the bug that
> erfl(negative NaN) = -1.

Possibly, it broken.  I changed the previous method that I used
to the above, because you weren't happy with it either. 

> 
> s_erf.c avoids these bugs using the following methods:
> - first mask out the sign bit: (ix = hx & MASK).
> - be more careful testing for NaNs.  In the above, the (lx | llx) test
>    would probably work if if it were not broken by being after the
>    (hx == 0xffff) test; the latter is apparently to detect -Inf in a
>    not so good way after not masking out the sign bit initially.
> Then it uses a more magic expression than x*x so as to avoid the 3 cases
> for +-1 and x*x.

I don't like magic fdlibm expressions. :)

> In expl.c, we eventually found an even more magic expression that
> avoids the (lx | llx) test.  This test tends to be slow (although it
> is in a rarely executed case, just looking at llx sometimes confuses
> the compiler to load llx for all cases).  The test is even more
> complicated for ld80, since you have to handle the normalization bit
> and should detected unsupported formats.  So it is good to avoid testing
> this in bits.
> 
> For the magic expression, the one in s_erf.c is already good.  We don't
> need to look at either lx or llx.  hx already consists of just the
> exponent (and sign) bits.  This corresponds to masking off the mantissa
> bits in s_erf.c.  (hx & 0x7fff) == 0x7fff classifies Infs and NaNs.
> Divide 1 by x to turn +-Inf into +-0 and not change NaNs except to
> quieten them.  This result is not used for +-Inf, but it keeps NaNs
> unchanged except for quietening.  Then add the integer +-1 to get the
> result +-1 for +-Inf and keep NaNs again unchanged except for
> quietening.  s_erf.c calculates the +-1 as an integer.  I don't know
> of any better way to turn +-Inf into +-1.  Expressions like x/Inf
> don't quite work.

I'll look at it again, but I found these magic expressions to be
a tad too ugly for my taste.

> > +	if (ax < LIM1) {
> 
> Not consistently misformatted as 'if(foo)'.
> 
> Why #defined constants for this and not for the others?  This one varies
> between ld80 and ld128, so using it makes the diffs between those easier
> to read, but it is a literal constant for f32 and d64, so using it makes
> the diffs from ld* to the reference d64 harder to read.

I can't remember why I used the #defines other than I flipped 
between using 1/0.35 and 2.85715.  I probably got tired of
editing the different locations.

> > +	    s = one/(ax*ax);
> > +	    if(ax < LIM0) {	/* |x| < 2.85715 */
> 
> Repeating magic numbers in comments defeats the point of using #define to
> make them less magic.  LIM0 seems to be the same for all precisions, and
> s_erf.c spells it less magically as 1/0.35.

1/0.35 is exactly representable.  2.85715 is exactly representable.

> > Index: ld80/s_erfl.c
> > ===================================================================
> > --- ld80/s_erfl.c	(revision 0)
> > +++ ld80/s_erfl.c	(working copy)
> > @@ -0,0 +1,353 @@
> > ...
> > +long double
> > +erfl(long double x)
> > ...
> > +	/* x is INF or NaN */
> > +	if (hx >= 0x7fff) {
> > +		if (hx == 0xffff)
> > +			return (-1);
> > +		if (lx == 0x8000000000000000)
> > +			return (1);
> > +		return (x * x);
> > +	}
> 
> After changing the first hx test to ((hx & 0x7fff) == 0x7fff) and removing
> the second hx test as above, this seems to be a correct but slow way to
> classify Infs and NaNs.  Oops, we still need a second hx test to classify
> -Inf.  This must be done after testing the normalization bit:
> 
>  	/* Wrong comment intentionally left out. */
>  	if ((hx & 0x7fff) == 0x7fff) {
>  		if (lx == 0x8000000000000000ULL)
>  			return (hx & 0x8000 ? -1 : 1);	/* erf(+-Inf) = +-1 */
>  		return (x + x);		/* erf(NaN/pseudo{Inf,NaN}) = qNaN */
>  	}
> 
> I don't like the ULL suffix, but it is needed to fix warnings for
> non-C99 compilers, and I used it in s_logl.c.  The above may be slow,
> but it is close to what I used in s_logl.c.  There it is impossible
> to avoid loading lx.  s_logl.c is a little more complicated, both to
> handle denormals and to classify unnormals as NaNs so that they get
> handled by the x+x return.

I vaguely recall using a "xxx == 0x7ff" type of test.  I may 
have it in a different msun tree or worse lost it.  I go over
the special case handling.

-- 
Steve