Use of C99 extra long double math functions after r236148
Bruce Evans
brde at optusnet.com.au
Sun Aug 12 23:00:29 UTC 2012
On Sat, 21 Jul 2012, Peter Jeremy wrote:
> Hi Bruce or das@ or Steve,
>
> I have a question on the following code from s_ccosh.c:
> % /*
> % * cosh(NaN + I NaN) = d(NaN) + I d(NaN).
> % *
> % * cosh(NaN +- I Inf) = d(NaN) + I d(NaN).
> % * Optionally raises the invalid floating-point exception.
> % * Choice = raise.
> % *
> % * cosh(NaN + I y) = d(NaN) + I d(NaN).
> % * Optionally raises the invalid floating-point exception for finite
> % * nonzero y. Choice = don't raise (except for signaling NaNs).
> % */
> % return (cpack((x * x) * (y - y), (x + x) * (y - y)));
>
> x is always NaN so the real part presumably just needs to be quietened
> before returning - ie (x + x) would seem to be sufficient. Why does
> the code use ((x * x) * (y - y))?
The more complex expression is to handle more cases with the same code
(mainly so as to avoid extra tests and branches for the classification.
For exceptional cases, we don't care much if the expressions in the
return statements are slower because they are more complex so as to be
general, or if they are simpler and faster but larger because the
classification is finer). We could try to avoid extra tests and branches
for the usual cases by first doing a coarse classification and then a fine
classification of the exceptional sub-cases. This would probably be
efficient (if the CPU has reasonable branch prediction), but it will give
larger code.
Such expressions are complicated mainly to get the signs and NaN mixing
right in all cases.
Quoting this line again:
> % return (cpack((x * x) * (y - y), (x + x) * (y - y)));
> y has no restriction on its value so an arithmetic operation with x is
> a good way to convert it to a NaN. Wouldn't (y + x) be sufficient?
Remember that cos[h]() is even and sin[h]() is odd. (x * x) is to get
an even sign for cosh[h]() and (x + x) is to get an odd sign for sin[h]().
After undoing this magic, the expression becomes:
> % return (cpack(cosh(x) * (y - y), sinh(x) * (y - y)));
and it remains to explain the (y - y) terms.
(It would probably work to use this expression. cosh() and sinh()
should get the signs and NaNs right. We optimize away these calls
because we can. This depends on knowing that x is Inf or NaN.
Oops, here x is only NaN, so there is more to explain:
- in other return statements, x can be either Inf or NaN. We copy
the magic expressions for simplicity.
- signs are not very important for NaNs, but I like to make them
consistent so that my test programs don't spew errors for unimportant
inconsistencies (it is easier to make the inconsistencies not occur than
to automatically decide if they are related to bugs. das@ doesn't
like spending any time on this detail).
- using the magic expressions gives the same behaviour for NaNs that
calling the functions would. For example, cosh(x) naturally wants
to square x. It may or may not do this explicitly for NaNs, but it
should for the same reasons that we should (more so), and in fact it
does: it returns x*x for both NaNs and Infs. This gets the sign
right for Infs, and quietens NaNs, and gives a sign for the NaN
result in the most natural way (whatever the hardware does for x*x)
- Here we have the option of quietening NaNs using some other expression,
perhaps x+y (since we have another arg), or x+x (since we don't care
about the sign of NaNs(. But this might give gratuitously different
results, depending on what the hardware does.
- x86 hardware mostly doesn't touch the sign of NaNs when they are
operated on. The main exceptions are than fabs() clears the sign
bit and negation may toggle the sign bit, depending on how negation
is implemented by the compiler (i387 fchs toggles the sign, but
compilers a subtraction operation which doesn't). SSE gives annoying
differences for NaNs, but IIRC these don't involve the sign bit for
non-mixing operations.
- the quiet bit can give annoying differences. NaNs could be mixed
using the expression x+y (if you prefer signs preserved) or x*y
(if you prefer signs multiplied). The result should be some functions
of the inputs. IIRC, IEEE7xx specifies this, but doesn't specify
the details. Most hardware does something like looking at the bits
of the 2 NaNs and returning the bits of the "largest" one. The
main differences are whether the sign and the quiet bits to be
included in the decision, and if so, how they affect the ordering.
There are annoying differences from them sometimes affecting the
ordering, depending on the hardware and the operand size. The
above operates first purely on each of x and y, in order to quieten
them before mixing:
x*y might use the quiet bit in the ordering decision
(x*x)*(y-y) kills the quiet bit before mixing, so that it cannot
affect the ordering descision
Again, the second expression is what happens naturally if you
write cosh(x)*sinh(y).
)
Back to explaining the (y - y) terms:
> % return (cpack(cosh(x) * (y - y), sinh(x) * (y - y)));
These are to optimize sin(y) and cos(y). sin() is odd and cos() is
even, but parity considerations no longer apply for y = +-Inf because
the value is now NaN, so its sign is meaningless. The sign of this
NaN is also unspecified, and we take advantage of this by letting
it be whatever the hardware gives for (y - y). We also want to be
compatible with what the function call would do, and fdlibm's
sin() and cos() in fact do exactly this, for the usual technical
reasons. Now y can be NaN, +-Inf or finite, and (y - y) handles
all cases. In sin() and cos(), this expression handles the NaN
and +-Inf cases, so we are consistent.
> My understanding is that:
> - Addition is generally faster than multiplication
Apart from the technical (parity) reasons, we don't really care about
efficiency in these exceptional cases, though we are doing perhaps
excessive optimizations to avoid the function calls.
Another organization of the function might start by evaluating the
4 sub-functions in all cases, so that the results can be used in all
cases later. IIRC, this is not done mainly because it isn't clear
that it doesn't give spurious exceptions. It also gives efficiency
in a natural way for special but non-exceptional cases like pure
real and pure imaginary.
> - Signs are irrelevant for NaN so merging the sign of x doesn't matter.
> - NaN + NaN returns the (quietened?) left-hand NaN
See above.
> - Inf + NaN returns the (quietened?) right-hand NaN
I forget if IEEE7xx requires thus, but some C functions near this one
are required to treat a NaN as an indeterminate value which always combines
with Inf to produce Inf. The sign of a NaN could reasonably matter
here:
+Inf + +NaN = +Inf since the indeterminate value of +NaN cannot be -Inf
+Inf + -NaN = NaN since the indeterminate value of -NaN can be -Inf
> - finite + NaN returns the (quietened?) right-hand NaN
>
> Also, whilst things like ((x + x) * (y - y)) are reasonably efficient
Another point is that using consistent expressions allows common code
for the common parts (like x*x) if that is a good (space) optimization.
> on x86, few (if any) RISC architectures support exceptional conditions
> in hardware. My understanding is that SPARC would trap back into the
> userland handler (lib/libc/sparc64/fpu) on each operation unless both
> arguments and the result are normalised numbers. Explicitly fiddling
> with the FPU state would seem faster than multiple traps.
Here it is more the size of the magic expressions that gives slowness.
Almost all of them either start with a NaN or create one.
Even on x86, there is annoying slowness for overflow and underflow in
a MD way. Exceptional cases for exp*() run much slower on i386(core2)
than on i386(athlon-any(?)) or amd64(core2) or amd64(athlon), because
instructions (if not their results) that overflow and underflow are
handled slowly on core2(i387) but at full speed on all the other
combinations including core2(SSE).
Bruce
More information about the freebsd-numerics
mailing list