(2nd time) tweaks to erff() threshold values

Bruce Evans brde at optusnet.com.au
Fri Aug 23 11:12:48 UTC 2013

On Thu, 22 Aug 2013, Steve Kargl wrote:

> (Having problems with freebsd-numerics list.  Hopefully, not
> a duplicate.)

It was a duplicate :-).

> I would like to apply the patch, which follows my .sig,
> to msun/src/s_erff.c.  It tweaks the threshold values
> to those appropriate for erff().
> ...
> Index: src/s_erff.c
> ===================================================================
> --- src/s_erff.c	(revision 1358)
> +++ src/s_erff.c	(working copy)
> @@ -107,10 +107,10 @@
> 	}
> 	if(ix < 0x3f580000) {		/* |x|<0.84375 */
> -	    if(ix < 0x31800000) { 	/* |x|<2**-28 */
> -	        if (ix < 0x04000000)
> +	    if(ix < 0x39000000) { 	/* |x|<2**-13 */

This increases the number of incorrectly rounded cases slightly on i386,
but 2**-14 decreases it slightly.  This is because i386 has extra
precision for floats which the nonlinear return expression can actually
use for approx. 2**-14 < |x| < 2**-13.

> +	        if (ix < 0x04000000)	/* |x|<0x1p-119 */
> 		    /*avoid underflow */
> -		    return (float)0.125*((float)8.0*x+efx8*x);
> +		    return (8*x+efx8*x)/8;

Also put the comment back on the same line as the return statement now
that it fits again (see the double version).

Also add "spurious" to the comment.  Underflow still occurs for most denormal
x.  This is non-spurious.  We just avoid it for |x| larger than about 8
times the largest denormal, by multiplying efx by 8 so that it is larger
than 1 (efx8 = 8(efx)).

A recent discussion in comp.std.c made me more aware that underflow
is mishandled for most functions in libm.  They need to use something
like the above to handle it correctly.  For example, sin(x) returns x
with inexact for denormal x, but it should return x with inexact AND
UNDERFLOW for denormal x, since x is only an approximation to sin(x) --
the exact result is slightly smaller in magnitude in x, so is "tiny"
when x is, and underflow means tiny plus inexact.  The above still
fails to raise underflow if 8*x, efx8*x, and their sum are all exact.
8*x is exact if x has 3 lower bits 0, which is quite often, and the
others are rarely exact, except on i386 without clang everything is
always exact since it is in extra precision.

In nonstandard rounding modes, sin(x) shouldn't be rounded to x, but
fdlibm and FreeBSD don't try to support such rounding modes for most
functions including sin().  If they were supported, then they would
increase the complications for underflow.  IEEE x54 allows 2 alternatives
for underflow.  One is when the result is tiny after rounding, and
inexact, and the other is when the result is tiny before rounding,
and inexact.  This causes complications even in round-to-nearest mode.
Consider sin(x) on the largest denormal.  The result of x is tiny both
before and after rounding, and inexact, so underflow should be rasied.
Consider sin(x) on the smallest normal.  The result of x is normal
after rounding, so is not tiny, but before rounding it is smaller than
the smallest normal so should probably be considered as "tiny", so
underflow should be raised iff it depends on tinyness before rounding.

The discussion on comp.std.c was mainly about how extra precision
complicates things further.  When an expression like (8*x+efx8*x)/8
is evaluated in extra precision, nothing is tiny, so underflow is not
raised.  Inexact is probably not raised either for this particular
expression.  Underflow should be raised iff the extra precision is
discarded, but the information for raising it has been partly lost.
It will only be raised on conversion if the result is "tiny" then
(before or after rounding) and is inexact THEN.  It is inexact then
iff any of the low bits that are discarded is nonzero.  But sometimes
the extra-precision result makes all these bits zero and discards
nonzero bits even further out.  The delayed conversion cannot do the
right thing since it doesn't know about the bits already discarded.

C standards get this wrong by having strict requirements for raising
underflow.  It is very difficult to raise it exactly when specified.
C standards don't have strict requirements for raising inexact because
they know about this problem for inexact.  But underflow is even harded
to decide than inexact.

Implementations that support trapping of overflows have additional
complications.  IEEE x54 specifies that if traps for underflow are
enabled, then underflow traps occur iff the result is tiny, while the
underflow exception is raised iff the result is tiny and inexact.
Thus, underflow traps, if enabled, occur more often than underflow
exceptions.  The trapped case is simpler but different (if the trap
handler handler just returns).  C standards don't support trapping
of FP exceptions, and are even further from supporting returning from
trap handlers.

> 		return x + efx*x;
> 	    }
> 	    z = x*x;
> ...

Some modified lines:

@ 	if(ix < 0x3f580000) {		/* |x|<0.84375 */
@ 	    if(ix < 0x38800000) { 	/* |x|<2**-14 */
@ 	        if (ix < 0x04000000)	/* |x|<0x1p-119 */
@ 		    return (8*x+efx8*x)/8;  /*avoid spurious underflow */
@ 		}
@ 		return x + efx*x;
@ 	    }

The whole erf implementation is probably not the best way for float precision,
but is good for understanding higher precisions.


More information about the freebsd-numerics mailing list