(2nd time) tweaks to erff() threshold values
Bruce Evans
brde at optusnet.com.au
Mon Aug 26 15:23:33 UTC 2013
On Sun, 25 Aug 2013, Steve Kargl wrote:
> On Sat, Aug 24, 2013 at 07:30:29PM -0700, Steve Kargl wrote:
>> On Sat, Aug 24, 2013 at 01:21:02PM -0700, Steve Kargl wrote:
>>> On Fri, Aug 23, 2013 at 09:57:44AM -0700, Steve Kargl wrote:
>>>> On Fri, Aug 23, 2013 at 09:12:33PM +1000, Bruce Evans wrote:
>>>>
>>>>> The whole erf implementation is probably not the best way for
>>>>> float precision, but is good for understanding higher precisions.
>>>>
>>>> I'm fairly certain that the implementation of erff() is not very
>>>> efficient. The polynomials, used in the rational approximations,
>>>> are the same order as those used in the double precision approximation.
>>>> I'll check the polys when update my P(x)/Q(x) remes algorithm for
>>>> erfl and erfcl.
>>>
>>> I seem to be right (although I haven't iterated on the P's and Q's).
>>
>> Now, with pretty testing and (perhaps) better coefficients:
You mean pretty ugly formatting of the declarations of th coefficients to
match the existing style :-).
> These values where for the interval [0,0.84375].
>>
>> | usec/call | Max ULP | X Max ULP
>> ---------+-----------+---------+---------------------------------
>> new erfc | 0.02757 | 0.65947 | 8.19824636e-01, 0x1.a3c00ep-1
>> old erfc | 0.03348 | 0.68218 | 8.43257010e-01, 0x1.afbf62p-1
>> ---------+-----------+---------+---------------------------------
>> new erf | 0.02302 | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127
>> old erf | 0.03028 | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127
Be careful testing on i386. Its extra precision gives smaller errors.
These look like the i386 errors. Old erff has a maximum error of more
than 0.9 ulps on all args.
> New interval [0.84375,1.25].
>
> | usecs/call| Max ULP | X Max ULP
> ---------+-----------+---------+------------------------------
> Old erfc | 0.03469 | 0.95158 | 1.24999416e+00, 0x1.3fff9ep+0
> New erfc | 0.02781 | 0.76497 | 1.24971473e+00, 0x1.3fed4ep+0
> ---------+-----------+---------+------------------------------
> Old erf | 0.03404 | 0.55617 | 1.24996567e+00, 0x1.3ffdcp+0
> New erf | 0.02774 | 0.54213 | 8.52452755e-01, 0x1.b474bp-1
I see an improvement like this on amd64 but not on i386. New erf is
down from 0.9+ to 0.8+ on amd64 on all args, but still 0.6+ on i386.
erfc has much larger errors on other intervals and I didn't test this
interval by itself.
> Note that s_erf.c claims that in this interval, a taylor series about
> erf(1) is used and suggests that the constant erx = 8.4506291151e-01
> is erf(1). That's bogus as erf(1) = 8.42700779e-01F (in float).
Well, not quite. It says:
% * Remark: here we use the taylor series expansion at x=1.
% * erf(1+s) = erf(1) + s*Poly(s)
% * = 0.845.. + P1(s)/Q1(s)
% * That is, we use rational approximation to approximate
% * erf(1+s) - (c = (single)0.84506291151)
% * Note that |P1/Q1|< 0.078 for x in [0.84375,1.25]
% * where
% * P1(s) = degree 6 poly in s
% * Q1(s) = degree 6 poly in s
It puts some of erf(1) in 0.845 and some in P1(s)/Q1(s).
> New diff:
It is relative to a version that already has the threshold changes.
> Index: src/s_erff.c
> ...
> -pa0 = -2.3621185683e-03, /* 0xbb1acdc6 */
> ...
> -qa1 = 1.0642088205e-01, /* 0x3dd9f331 */
> ...
> +pa0 = 1.35131621e-08F, /* 0x1.d04f08p-27 */
> ...
> +qa1 = 6.06513679e-01F, /* 0x1.3688f6p-1 */
(qa0 is spelled 'one' in both versions.)
You still have some of erf(1) in pa0/qa0, but not as much as before. The
decomposition should be something like hi+lo -- write erf(1) = hi+lo and
put hi in erx and lo in pa0/qa0 = pa0. Is that exactly what you do? I
wonder if there is a technical reason for putting more in pa0/qa0. Or
maybe the whole expression should be written as hi+P/Q+lo, where P(0) = 0.
Sometimes rational approximations should be done by splitting out even
more terms from the P/Q part and adding the other terms more carefully.
The parts that are split out are chosen to make them easier to add up
carefully, while keeping the parts that are not split out small so that
they don't need to be evaluated carefully. See k_tan.c for an example.
Another bug in the comment is that it hasn't caught up the code. The
variable 'erx' was spelled 'c' in old BSD libm. It is still spelled
'c' in the fdlibm comment.
The comment does say that 'c' is rounded to single. That is good for
a hi+lo decomposition, and it may be necessary for technical reasons
to keep some of 'c's lower bits zero. Rounding it to single doesn't
give this when the precision is already single. Old erff didn't round
it further either.
Bruce
More information about the freebsd-numerics
mailing list