bin/170206: complex arcsinh, log, etc.

Sat Jul 28 04:30:12 UTC 2012

The following reply was made to PR bin/170206; it has been noted by GNATS.

From: Bruce Evans <brde at optusnet.com.au>
To: Stephen Montgomery-Smith <stephen at missouri.edu>
Cc: Bruce Evans <brde at optusnet.com.au>,
        Stephen Montgomery-Smith <stephen at FreeBSD.org>,
        FreeBSD-gnats-submit at FreeBSD.org, freebsd-bugs at FreeBSD.org
Subject: Re: bin/170206: complex arcsinh, log, etc.
Date: Sat, 28 Jul 2012 14:28:13 +1000 (EST)

 On Fri, 27 Jul 2012, Stephen Montgomery-Smith wrote:

 > On 07/27/2012 09:26 AM, Bruce Evans wrote:
 >
 >> VC> > For clog, the worst case that I've found so far has x^2+y^2-1 ~=
 >> 1e-47:
 >> VC> >
 >> VC> >      x =
 >> 0.999999999999999555910790149937383830547332763671875000000000
 >> VC> >      y =
 >> VC> > 0.0000000298023223876953091912775497878893005143652317201485857367516
 >> VC> >        (need high precision decimal or these rounded to 53 bits
 >> binary)
 >> VC> >      x^2+y^2-1 = 1.0947644252537633366591637369e-47
 >> VC> VC> That is exactly 2^(-156).  So maybe triple quad precision really
 >> is enough.
 >
 > Furthermore, if you use the computation (x-1)*(x+1)*y*y (assuming as you do 
 > x>y>0), only double precision is necessary.  This is proved in the paper 
 > "Implementing Complex Elementary Functions Using Exception Handling" by Hull, 
 > Fairgrieve, Tang, ACM Transactions on Mathematical Software, Vol 20, No 2, 
 > 1994.  They give a bound on the error, which I think can be interpreted as 
 > being around 3.9 ULP.

 I'm using x*x-1+y*y in doubled precision, which I believe is better.

 I'm now thinking of the following refinement: suppose x is close to 1
 and y is close to 0 (other cases are easier to get right accidentally,
 but harder to analyze).  Then u = x-1 in non-doubled precision is exact
 and cancels most low bits.  So u*u is exact in non-doubled precision.
 Thus x*x-1 can be evalated in mostly-non-doubled precision as u*u+2*u.
 2u is exact, and u*u is a tiny correction term (less than about half
 an ulp relative to 2*u).  If we just wanted to pass this result to
 log1p(), it would round to -2*u and no doubled precision would be
 necessary.  But we need to retain it to add to y*y.  The necessary
 extra precision is much easier for addition than for multiplication.
 If I'm right that u*u is less than half an ulp, then (-2*u, u*u) is
 already in my normal form for doubled precision.

 Oops, this is essentially the same as (x-1)*(x+1).  x-1 is u, and
 x+1 is u+2, so the product is u*u+2*u grouped in a numerically bad
 way (it either needs extra precision for x+1 and then for the
 multiplication, or loses accuracy starting with x+1).  Did you
 mean doubled precision, not double precision?

 This also avoids the following complication: double precision has an
 odd number of bits, so it can't be split in half for calculating x*x
 and y*y.  The usual 26+27 split would give an error of half an ulp in
 doubled doubled precision.  The above avoids this for x*x in some
 critical cases.  I hope in all critical cases.

 Cases where x*x and y*y are both nearly 0.5 have other interesting
 points.  If x*x => 0.5, then x*x-1 is exact in doubled precsion.  When
 x*x < 0.5, x*x-1 is not necessarily exact in doubled precision.  I
 handle these cases by writing the expression as (x*x-0.5)+(y*y-0.5).
 When x*x >= 0.5 and y*y >= 0.25, both methods give exact subtractions
 and I think they give the same result.

 > And I think you will see that your example does not contradict their theorem. 
 > Because in your example, x-1 will be rather small.
 >
 > So to get reasonable ULP (reasonable meaning 4 rather than 1), double 
 > precision is all you need.

 I think you mean doubled precision.

 4 in undoubled precision is mediocre, but 4 in doubled precision is
 many more than needed, but I hope I get 0.5+ epsilon.  The result
 starting with double precision would then be accurate with 53-4 or
 (54-0.5-epsilon) extra bits if the log1p() of it were taken with
 infinite precision.  But log1p() has finite precision, and I'm seeing
 the effects of slightly more than half a double precision bit being
 lost on conversion of the doubled double precision x*x+y*y-1 when
 passed to log1p(), and then another slightly more than half [...] lost
 by imperfect rounding of log1p().  So one of my tests is to remove the
 log1p() source of inaccuracy by replacing it with log1pl().  In float
 precision, exhaustive testing is possible though not complete; all
 cases tested with of |z| as close as possible to 1 were perfectly
 rounded.

 Bruce