(2nd time) tweaks to erff() threshold values

Wed Aug 28 11:31:30 UTC 2013

On Tue, 27 Aug 2013, Steve Kargl wrote:

> On Tue, Aug 27, 2013 at 09:24:39PM +1000, Bruce Evans wrote:
>> On Mon, 26 Aug 2013, Steve Kargl wrote:

>>> ...
>>> /*
>>> - * Coefficients for approximation to  erfc in [1.25,1/0.35]
>>> + *  Domain [1.25,1/0.35], range [-7.043e-10,7.457e-10]:
>>> + *  |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-30
>>>  */
>>
>> It is even less clear than for the others that the scaling of the error
>> is correct.  The comment could make this clearer by giving the ratio of
>> the correct error scale with the one used in the approximation, at the
>> endpoints (max and min for this, usually at the endpoints).
>
> I'm not sure what you mean here.  In looking at error curves,
> it turns out that one of the 2 limits in the range is taken
> from an endpoint.

I mean that these are absolute errors for a function related to the
result that is especially obscure in this case -- the function is
log(x*result).  The error that needs to be minimized is the relative
error of the result, where "relative" is not quite the ratio of the
error and the result -- it is the ratio of the error and [result
rounded to a nearby power of 2; I'm not sure if the rounding is
up or down].  For the absolute error to be close enough to the
relative error, the relative error must not vary much across the
interval, AND the scaling for the abslute error must be compatible.
This is far from clear here.  I think taking logs changes the scale
for the absolute error to logarithmic, so this error is not really
absolute.  Hopefully the scale for the result is similarly logarithmic,
so that the scales are compatible.  There is also a factor of x before
taking logs, so the scaling is not logarithmic either.

>>> -	if (ix < 0x41e00000) {		/* |x|<28 */
>>> +	if (ix < 0x41200000) {		/* |x|<10 */
>>
>> 10 is too small.  It gives spurious underflow to 0 and larger than necessary
>> errors, at least on i386.  11 works.
>
> I changed to 11.  This required a new set of rb and sb coefficients
> as the old set was for [1/0.35, 10].  With the new set the max ulp
> for erfc goes up, but it is still much better that 63.something .

Something wrong here.  I got smaller errors by changing 10 to 11 without
changing the coeffs.  Now with updated coeffs, I get larger errors,
at least on i386 (previously, there were no errors of more than 1.5 ulps
above 1.88...  Now there are some above 2.04).

>>> +	    r  = expf(-z*z-0.5625F)*expf((z-x)*(z+x)+R/S);
>>
>> Good fix.  Is the point that -z*z-0.5625F must be exact, but the old
>> splitting only made z*z exact?  Is -z*z-0.5625F now always exact?
>> Anyway, the error for this case is now smaller than near 1.88, and
>> changing the expf()'s to exp()s doesn't reduce it much more, so the
>> old extra-precision methods can't be much better.
>
> Back to the choice of mask.  I was trying to understand
> the origins of the constant 0.5625F.  The asymptotic
> expansion of erfc(x) is (exp(-x**2) / (x*sqrt(pi))) * P(x)
> with P(x) being some polynomial.  It turns out
> exp(-0.5625) = 0.56978 and 1/sqrt(pi) = 0.56419.  So, the
> algorithm drops the 1/sqrt(pi) and approximates it by
> exp(-0.5625).

It is a hi+lo decomposition done multiplicatively and exponentially:

 	1/sqrt(pi) = exp(-0.5625) * exp(lo)

where -0.5625 and lo are folded into other terms, and the hi part must
be done exactly before exp() on it for enough accuracy.  Taking exp()
loses between 0.5 and 1 ulps of accuracy, and more is lost in the
produce of the exp()'s.  Hopefully the arg of the second exp() has
a small error, so not much more accuracy is lost for it.

> Well, in reading and re-reading the comment in s_erf.c, this
> part caught my attention:
>
> *     Note1:
> *	   To compute exp(-x*x-0.5625+R/S), let s be a single
> *	   precision number and s := x; then
> *		-x*x = -s*s + (s-x)*(s+x)
> *	        exp(-x*x-0.5626+R/S) =

The comment but not the code has this off-by-.0001 0.5626.

> *			exp(-s*s-0.5625)*exp((s-x)*(s+x)+R/S);
>
> This applies to the double precision version, ie erf(), so I
> simply ask the question whether 0xfffff000 was adequate masking
> out the low order bits. Fortunately, I only needed to test
> 0xffff8000, 0xffffc0000, and 0xffffe000.

I tried the others too, and they worked less well.  But I haven't
checked that -s*s-0.5625[F] is exact.  In double precision, many
more bits are stripped from x to create x, so exactness is less
surprising.

Bruce