Use of C99 extra long double math functions after r236148

Mon Jul 9 03:41:46 UTC 2012

On 07/08/2012 09:01 PM, Steve Kargl wrote:

> Have you read Goldberg's paper?

I must admit that I had not.  I found it at:

http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

> Not to mention, I've seen way too many examples of 'x - y'
> where cancellation of significant digits causes
> problems.  Throw in rather poor estimates of function
> results with real poor ULP and you have problems.

I think that if a program includes "x-y" where cancellation causes huge 
loss of digits, then the program should be considered highly flawed. 
The programmer might get lucky, and a few extra ULP might save his skin, 
but it would be better if the program catastrophically failed so that he 
would realize they need to do something smarter.

I got my first scientific calculator around 1975, and I remember my 
surprise when I discovered that (1/3)*3-1 on my calculator did not 
produce zero.  Years later I bought another calculator, and imagine my 
further surprise when (1/3)*3-1 did produce zero.  They must have done 
something very smart, I thought.

The problem is, these kinds of tricks can only help you so much. 
Calculations like 
arccos(arccos(arccos(arccos(cos(cos(cos(cos(x)))))))))-x are probably 
not going to be correct no matter how smart the programmer is.

The paper by Goldberg has some really nice stuff.  I teach a class on 
numerical methods.  One example I present is the problem using equation 
(4) of Goldberg's paper to solve quadratic equations.  I tell them that 
the smart thing to do is to use equation (4) or equation (5) depending 
on whether b has the same sign as sqrt(b^2-4ac).  This is a very good 
illustration of how to overcome the "x-y" problem, and in my opinion it 
has to be understood by the programmer, not the writer of the 
square-root package.  Trying to compensate by getting that lost drop of 
ULP out of square root is looking in the wrong direction.

But there is other stuff in his paper I don't like, and it comes across 
as nit-picking rather than really thinking through why he wants the 
extra accuracy.  I feel like he is saving the pennies, but losing the 
dollars.

For example, computing the quadratic formula when b^2 and 4ac are 
roughly equal (discussed in his proof of Theorem 4).  He says that 
roughly half the digits of the final answer will be contaminated.  He 
recommends performing the calculation with double the precision, and 
then retaining what is left.  The reason I don't like his solution is 
this.  The contamination of half the digits is real, not some artifact 
of the computing method.  Unless you know EXACTLY what a, b and c are, 
only half the digits of the quadratic formula are valid, no matter how 
you calculate it.  For example, maybe a is calculated by some other part 
of the program as 1/3.  Since 1/3 cannot be represented exactly in base 
2, it will not be exactly 1/3.  That part of the program doesn't know 
that this number is going to be used later on in the quadratic formula, 
so it computes it in single precision.  Now the quadratic formula 
algorithm converts a to double precision.  But a will still be 1/3 to 
single precision.  The only way to avoid this error is to perform EVERY 
pertinent calculation prior to the use of quadratic formula in double 
precision.  (And suppose instead the data comes from a scientific 
experiment - the programmer needs to be performing an error analysis, 
not hoping his implementation of quadratic formula is performed in 
double precision.)

Similarly, I am not a fan of the Kahan Summation Formula (Theorem 8). 
There are two reasons why I might compute large sums.  One is 
accounting, when I better be using high enough precision that the 
arithmetic is exact.  The other is something like numerical integration. 
  But in the latter case, unless the summands have a very special form, 
it is likely that each summand will only be accurate to within e.  Thus 
the true sum is only known to within n*e, and so the naive method really 
hasn't lost any precision at all.  And typically n*e will be so small 
compared to the true answer that it won't matter.  If it does matter, 
the problem is that n is too big.  The algorithm will take decades to 
run.  Focus efforts on producing a different numerical integration 
method, instead of getting that lost drop of ULP.  I cannot envision any 
non-contrived circumstance where the Kahan summation formula is going to 
give an answer that is significantly more reliable than naive summation. 
  I can envision a circumstance when the reduction in speed of the Kahan 
summation formula over naive summation could be significant.

I think the example I like best that illustrates floating point errors 
is inverting badly conditioned matrices.  And any scientist who works 
with large matrices had better know what ill-conditioned means.  The 
proper answer is that if your matrix is ill-conditioned, give up.  You 
need to look at the problem where your matrix came from, and rethink how 
you concert it into a matrix problem, so that the matrix is not 
ill-conditioned.  Chasing that last drop of ULP simply will not help one 
iota.

Stephen