(2nd time) tweaks to erff() threshold values

Tue Aug 27 18:47:08 UTC 2013

On Tue, Aug 27, 2013 at 09:24:39PM +1000, Bruce Evans wrote:
> On Mon, 26 Aug 2013, Steve Kargl wrote:
> 
> > erfc  (Look at the old and new ULP for 3rd and 4th interval.  This is due
> >       to the change in mask from 0xfffff000 to 0xffffe000.)
> 
> A very interesting fix.

Well, I stumped across this fix.  See below.

> Apparently the splitting point 0.84375 is not very well chosen, since
> we get largest errors just before it and much smaller errors above it
> by using another algorithm.

The only hint about the choice of 0.84375 is from the comment
int s_erf.c.

 *	   is close to one. The interval is chosen because the fix
 *	   point of erf(x) is near 0.6174 (i.e., erf(x)=x when x is
 *	   near 0.6174), and by some experiment, 0.84375 is chosen to
 * 	   guarantee the error is less than one ulp for erf.

I haven't investigated whether 0.84375 is best choice.  However,
exhaustive testing in [0.6, 0.62] gives a max ulp of 0.666.

> > /*
> > - * Coefficients for approximation to  erf  in [0.84375,1.25]
> > + *  Domain [0, 0.84375], range ~[-5.4446e-10,5.5197e-10]:
> > + *  |(erf(x) - x)/x - p(x)/q(x)| < 2**-31.
> > + */
> 
> Extra spaces.

Fixed.

> > ...
> > /*
> > - * Coefficients for approximation to  erfc in [1.25,1/0.35]
> > + *  Domain [1.25,1/0.35], range [-7.043e-10,7.457e-10]:
> > + *  |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-30
> >  */
> 
> It is even less clear than for the others that the scaling of the error
> is correct.  The comment could make this clearer by giving the ratio of
> the correct error scale with the one used in the approximation, at the
> endpoints (max and min for this, usually at the endpoints).

I'm not sure what you mean here.  In looking at error curves,
it turns out that one of the 2 limits in the range is taken
from an endpoint.

> > +sa4  = -5.77397496e-02F, /* -0x1.d90108p-5 */
> 
> > /*
> > - * Coefficients for approximation to  erfc in [1/.35,28]
> > + *  Domain [1.25,1/0.35], range [-3.753e-12,3.875e-12]:
> > + *  |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-43
> >  */
> 
> Domain needs more editing.

Fixed.

> > -	if (ix < 0x41e00000) {		/* |x|<28 */
> > +	if (ix < 0x41200000) {		/* |x|<10 */
> 
> 10 is too small.  It gives spurious underflow to 0 and larger than necessary
> errors, at least on i386.  11 works.

I changed to 11.  This required a new set of rb and sb coefficients
as the old set was for [1/0.35, 10].  With the new set the max ulp
for erfc goes up, but it is still much better that 63.something .

./testf -m 2.857143 -M 11 -E double
Interval tested: [2.8571,11.0000]
    Max ULP: 2.92095
  x Max ULP: 4.08856726e+00, 0x1.05ab16p+2

> > +	        R=rb0+s*(rb1+s*(rb2+s*(rb3+s*rb4)));
> > +		S=one+s*(sb1+s*(sb2+s*(sb3+s*sb4)));
> > 	    }
> > 	    GET_FLOAT_WORD(ix,x);
> 
> This and the GET_FLOAT_WORD() for erff() are bogus.  The word has already
> been read into hx.

Well spotted.  Fixed.

> > -	    SET_FLOAT_WORD(z,ix&0xfffff000);
> > -	    r  =  __ieee754_expf(-z*z-(float)0.5625)*
> > -			__ieee754_expf((z-x)*(z+x)+R/S);
> > +	    SET_FLOAT_WORD(z,ix&0xffffe000);
> 
> Change further to use hx.

Fixed.

> > +	    r  = expf(-z*z-0.5625F)*expf((z-x)*(z+x)+R/S);
> 
> Good fix.  Is the point that -z*z-0.5625F must be exact, but the old
> splitting only made z*z exact?  Is -z*z-0.5625F now always exact?
> Anyway, the error for this case is now smaller than near 1.88, and
> changing the expf()'s to exp()s doesn't reduce it much more, so the
> old extra-precision methods can't be much better.

Back to the choice of mask.  I was trying to understand
the origins of the constant 0.5625F.  The asymptotic 
expansion of erfc(x) is (exp(-x**2) / (x*sqrt(pi))) * P(x)
with P(x) being some polynomial.  It turns out 
exp(-0.5625) = 0.56978 and 1/sqrt(pi) = 0.56419.  So, the
algorithm drops the 1/sqrt(pi) and approximates it by
exp(-0.5625).

Well, in reading and re-reading the comment in s_erf.c, this 
part caught my attention:

 *     Note1:
 *	   To compute exp(-x*x-0.5625+R/S), let s be a single
 *	   precision number and s := x; then
 *		-x*x = -s*s + (s-x)*(s+x)
 *	        exp(-x*x-0.5626+R/S) =
 *			exp(-s*s-0.5625)*exp((s-x)*(s+x)+R/S);

This applies to the double precision version, ie erf(), so I 
simply ask the question whether 0xfffff000 was adequate masking
out the low order bits. Fortunately, I only needed to test
0xffff8000, 0xffffc0000, and 0xffffe000.

-- 
Steve