Use of C99 extra long double math functions after r236148
Stephen Montgomery-Smith
stephen at missouri.edu
Mon Jul 9 03:41:46 UTC 2012
On 07/08/2012 09:01 PM, Steve Kargl wrote:
> Have you read Goldberg's paper?
I must admit that I had not. I found it at:
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
> Not to mention, I've seen way too many examples of 'x - y'
> where cancellation of significant digits causes
> problems. Throw in rather poor estimates of function
> results with real poor ULP and you have problems.
I think that if a program includes "x-y" where cancellation causes huge
loss of digits, then the program should be considered highly flawed.
The programmer might get lucky, and a few extra ULP might save his skin,
but it would be better if the program catastrophically failed so that he
would realize they need to do something smarter.
I got my first scientific calculator around 1975, and I remember my
surprise when I discovered that (1/3)*3-1 on my calculator did not
produce zero. Years later I bought another calculator, and imagine my
further surprise when (1/3)*3-1 did produce zero. They must have done
something very smart, I thought.
The problem is, these kinds of tricks can only help you so much.
Calculations like
arccos(arccos(arccos(arccos(cos(cos(cos(cos(x)))))))))-x are probably
not going to be correct no matter how smart the programmer is.
The paper by Goldberg has some really nice stuff. I teach a class on
numerical methods. One example I present is the problem using equation
(4) of Goldberg's paper to solve quadratic equations. I tell them that
the smart thing to do is to use equation (4) or equation (5) depending
on whether b has the same sign as sqrt(b^2-4ac). This is a very good
illustration of how to overcome the "x-y" problem, and in my opinion it
has to be understood by the programmer, not the writer of the
square-root package. Trying to compensate by getting that lost drop of
ULP out of square root is looking in the wrong direction.
But there is other stuff in his paper I don't like, and it comes across
as nit-picking rather than really thinking through why he wants the
extra accuracy. I feel like he is saving the pennies, but losing the
dollars.
For example, computing the quadratic formula when b^2 and 4ac are
roughly equal (discussed in his proof of Theorem 4). He says that
roughly half the digits of the final answer will be contaminated. He
recommends performing the calculation with double the precision, and
then retaining what is left. The reason I don't like his solution is
this. The contamination of half the digits is real, not some artifact
of the computing method. Unless you know EXACTLY what a, b and c are,
only half the digits of the quadratic formula are valid, no matter how
you calculate it. For example, maybe a is calculated by some other part
of the program as 1/3. Since 1/3 cannot be represented exactly in base
2, it will not be exactly 1/3. That part of the program doesn't know
that this number is going to be used later on in the quadratic formula,
so it computes it in single precision. Now the quadratic formula
algorithm converts a to double precision. But a will still be 1/3 to
single precision. The only way to avoid this error is to perform EVERY
pertinent calculation prior to the use of quadratic formula in double
precision. (And suppose instead the data comes from a scientific
experiment - the programmer needs to be performing an error analysis,
not hoping his implementation of quadratic formula is performed in
double precision.)
Similarly, I am not a fan of the Kahan Summation Formula (Theorem 8).
There are two reasons why I might compute large sums. One is
accounting, when I better be using high enough precision that the
arithmetic is exact. The other is something like numerical integration.
But in the latter case, unless the summands have a very special form,
it is likely that each summand will only be accurate to within e. Thus
the true sum is only known to within n*e, and so the naive method really
hasn't lost any precision at all. And typically n*e will be so small
compared to the true answer that it won't matter. If it does matter,
the problem is that n is too big. The algorithm will take decades to
run. Focus efforts on producing a different numerical integration
method, instead of getting that lost drop of ULP. I cannot envision any
non-contrived circumstance where the Kahan summation formula is going to
give an answer that is significantly more reliable than naive summation.
I can envision a circumstance when the reduction in speed of the Kahan
summation formula over naive summation could be significant.
I think the example I like best that illustrates floating point errors
is inverting badly conditioned matrices. And any scientist who works
with large matrices had better know what ill-conditioned means. The
proper answer is that if your matrix is ill-conditioned, give up. You
need to look at the problem where your matrix came from, and rethink how
you concert it into a matrix problem, so that the matrix is not
ill-conditioned. Chasing that last drop of ULP simply will not help one
iota.
Stephen
More information about the freebsd-current
mailing list