(2nd time) tweaks to erff() threshold values
Steve Kargl
sgk at troutmask.apl.washington.edu
Tue Aug 27 18:47:08 UTC 2013
On Tue, Aug 27, 2013 at 09:24:39PM +1000, Bruce Evans wrote:
> On Mon, 26 Aug 2013, Steve Kargl wrote:
>
> > erfc (Look at the old and new ULP for 3rd and 4th interval. This is due
> > to the change in mask from 0xfffff000 to 0xffffe000.)
>
> A very interesting fix.
Well, I stumped across this fix. See below.
> Apparently the splitting point 0.84375 is not very well chosen, since
> we get largest errors just before it and much smaller errors above it
> by using another algorithm.
The only hint about the choice of 0.84375 is from the comment
int s_erf.c.
* is close to one. The interval is chosen because the fix
* point of erf(x) is near 0.6174 (i.e., erf(x)=x when x is
* near 0.6174), and by some experiment, 0.84375 is chosen to
* guarantee the error is less than one ulp for erf.
I haven't investigated whether 0.84375 is best choice. However,
exhaustive testing in [0.6, 0.62] gives a max ulp of 0.666.
> > /*
> > - * Coefficients for approximation to erf in [0.84375,1.25]
> > + * Domain [0, 0.84375], range ~[-5.4446e-10,5.5197e-10]:
> > + * |(erf(x) - x)/x - p(x)/q(x)| < 2**-31.
> > + */
>
> Extra spaces.
Fixed.
> > ...
> > /*
> > - * Coefficients for approximation to erfc in [1.25,1/0.35]
> > + * Domain [1.25,1/0.35], range [-7.043e-10,7.457e-10]:
> > + * |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-30
> > */
>
> It is even less clear than for the others that the scaling of the error
> is correct. The comment could make this clearer by giving the ratio of
> the correct error scale with the one used in the approximation, at the
> endpoints (max and min for this, usually at the endpoints).
I'm not sure what you mean here. In looking at error curves,
it turns out that one of the 2 limits in the range is taken
from an endpoint.
> > +sa4 = -5.77397496e-02F, /* -0x1.d90108p-5 */
>
> > /*
> > - * Coefficients for approximation to erfc in [1/.35,28]
> > + * Domain [1.25,1/0.35], range [-3.753e-12,3.875e-12]:
> > + * |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-43
> > */
>
> Domain needs more editing.
Fixed.
> > - if (ix < 0x41e00000) { /* |x|<28 */
> > + if (ix < 0x41200000) { /* |x|<10 */
>
> 10 is too small. It gives spurious underflow to 0 and larger than necessary
> errors, at least on i386. 11 works.
I changed to 11. This required a new set of rb and sb coefficients
as the old set was for [1/0.35, 10]. With the new set the max ulp
for erfc goes up, but it is still much better that 63.something .
./testf -m 2.857143 -M 11 -E double
Interval tested: [2.8571,11.0000]
Max ULP: 2.92095
x Max ULP: 4.08856726e+00, 0x1.05ab16p+2
> > + R=rb0+s*(rb1+s*(rb2+s*(rb3+s*rb4)));
> > + S=one+s*(sb1+s*(sb2+s*(sb3+s*sb4)));
> > }
> > GET_FLOAT_WORD(ix,x);
>
> This and the GET_FLOAT_WORD() for erff() are bogus. The word has already
> been read into hx.
Well spotted. Fixed.
> > - SET_FLOAT_WORD(z,ix&0xfffff000);
> > - r = __ieee754_expf(-z*z-(float)0.5625)*
> > - __ieee754_expf((z-x)*(z+x)+R/S);
> > + SET_FLOAT_WORD(z,ix&0xffffe000);
>
> Change further to use hx.
Fixed.
> > + r = expf(-z*z-0.5625F)*expf((z-x)*(z+x)+R/S);
>
> Good fix. Is the point that -z*z-0.5625F must be exact, but the old
> splitting only made z*z exact? Is -z*z-0.5625F now always exact?
> Anyway, the error for this case is now smaller than near 1.88, and
> changing the expf()'s to exp()s doesn't reduce it much more, so the
> old extra-precision methods can't be much better.
Back to the choice of mask. I was trying to understand
the origins of the constant 0.5625F. The asymptotic
expansion of erfc(x) is (exp(-x**2) / (x*sqrt(pi))) * P(x)
with P(x) being some polynomial. It turns out
exp(-0.5625) = 0.56978 and 1/sqrt(pi) = 0.56419. So, the
algorithm drops the 1/sqrt(pi) and approximates it by
exp(-0.5625).
Well, in reading and re-reading the comment in s_erf.c, this
part caught my attention:
* Note1:
* To compute exp(-x*x-0.5625+R/S), let s be a single
* precision number and s := x; then
* -x*x = -s*s + (s-x)*(s+x)
* exp(-x*x-0.5626+R/S) =
* exp(-s*s-0.5625)*exp((s-x)*(s+x)+R/S);
This applies to the double precision version, ie erf(), so I
simply ask the question whether 0xfffff000 was adequate masking
out the low order bits. Fortunately, I only needed to test
0xffff8000, 0xffffc0000, and 0xffffe000.
--
Steve
More information about the freebsd-numerics
mailing list