cpow and clog

Wed Nov 8 14:33:08 UTC 2017

On Wed, 8 Nov 2017, Montgomery-Smith, Stephen wrote:

> On 11/06/2017 02:41 PM, Steve Kargl wrote:
>> On Mon, Nov 06, 2017 at 08:49:43PM +0100, Michael Danilov wrote:
>>> Hello,
>>>
>>> I would like to have some feedback on my attempt to import OpenBSD
>>> code for cpow and clog:
>>>
>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221341
>>> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=187693
>
> The problem with this implementation of clog is that they will produce
> very inaccurate results when |z| is close to 1.  A proper implementation
> is given in the paper:
> T. E. Hull, Thomas F. Fairgrieve, Ping Tak Peter Tang, â€œImplementing
> Complex Elementary Functions Using Exception Handlingâ€, ACM Transactions
> on Mathematical Software, Volume 20 (1994), pp 215â€“244, Corrigenda, p553.
> 
> They use a precision which is twice that of the required precision.  For
> example, if you want log(sqrt(x^2+y^2)) accurate to about 17 digits, you
> will need to compute it to 34 digits.

I don't see how that can be enough.  I found that about 3 times as much
precision is needed for the worst case in double precision.  Closer to
160 bits than 150, and 2 times as much much is only 106 bits.

This defeated my intuition, but it was easy to find the worst case by
searching a few billion cases near z = 1 to improve my intuition, then
a few more in a smaller search area to find the worst case.  The authors
of the paper were/are real numerical analysts, but CPUs were too slow in
1004 for them to routinely search billions of cases to check their
algorithms.

> If all you care about is getting a a good relative error for the total
> answer (rather than insisting on a good relative error for the real and
> imaginary parts separately), then for |x+I*y| close to one, compute the
> real part using
>
> 0.5*logp1((x-1)*(x+1)+y*y)

z = 1 is a very non-exceptional value, so clog() should be very accurate
near it.

I just noticed that draft C99 (n869) Annex F (IEEE754 binding) is bad
enough to say that cpow(z, c) is implicitly specified by the simplistic
formula:

     cpow(z, c) = cexp(c * clog(z))

(this is in a section with similar specifications like:

     ccos(z) = ccosh(I * z))

which are not incorrect since they don't lose precision or give spurious
exceptions (provided I * z is not evaluated simplistically)).  This bug
is missing without Annex F, by allowing any quality of implementation.
This bug is fixed in C99 through draft C11 (n1570) Annex by not requiring
anything extra except raising all relevant exception and saying in a
footnote that the above low quality implementation conforms since it
raises all relevant exceptions (it might raise spurious exceptions but
this conforms).

Anyway, if you use the simplistic formula for cpow(), then clog() should
be very accurate to avoid exponentiating a large error.  Even the best
possible minimal error of 0.5 ulps is not good.

>>> What happened to the alternative implementation mentioned in the thread below?
>>
>> bde has an implementation of clog[fl].  He may someday
>> commit it.  I don't know if anyone ever worked on cpow[fl].
>> I stopped working on powl and tgammal when I returned my
>> commit bit due to differences with "higher-ranking" committers.
>
> bde's implementation was fantastic.  He managed to use two double
> variables to represent a single quadruple precision real (using a
                                   quad-double
> technique which I believe is standard, but it was new to me).

Thanks.  I forgot the details, but just checked some.  It uses more like
4 double variables to represent a single tri-double precision variable.
Full quad-double precision would require something like 3 ripple carries
per addition and I managed to reduce this something like only 1 ripple
carry per addition by doing bi-double additions in a delicate order.
This loses some precision but retains enough.  This is least delicate
near z = 1, but it is more accurate when it works, and not much slower,
so it is used for approx. 0.5 < |z|^2 < 3.

I don't know if this is standard.  Other parts of the algorithm are standard.

> Why he never committed it, I don't know.  People on this list take way
> too long to commit excellent software.
>
> For example, I had code for cacos, catan, etc, ready for a long time
> before someone else stepped in and committed it.  And I tested it to
> death.  I even found a bug in
> T. E. Hull, Thomas F. Fairgrieve, Ping Tak Peter Tang, â€œImplementing the
> complex arcsin and arccosine functions using exception handlingâ€, ACM
> Transactions on Mathematical Software, Volume 23 (1997) pp 299â€“335.

We were waiting for you to commit your code :-).  It is still not complete.
It is missing many engineering details:
- file names not in a normal namespace (ca* instead of normal e_c* or s_c*).
   East to fix, especially before committing.
- multiple functions per file.  Not so easy to fix.  I didn't manage to fix
   it for my log*l().  I tried many not so good splittings using #include's
   of a .c file with ifdefs, and different kernel functions, and inline
   functions.  Inline functions are now optimized well enough by clang
   (but static ones waste space and extern ones are hard to manage).  My
   clog*() file has the same problem.  Steve split it, but I haven't got
   around to even testing that.
- switching the rounding precision for long double functions on i386.  I
   couldn't see a clean way to do this.  It can be done by sprinkling
   ENTERIT() and RETURNI() macros.  ENTERIT() (ENTERI() with the return
   type specified) is also needed for clogl(), but hasn't been committed
   yet.
- comment stripping and/or generating source files from a template.  You
   used this for development.  The scripts to do this were not very suitable
   for committing, but perhaps better than nothing.  IIRC, the template
   was the double version with a few extras.  The double version has full
   comments, and all comments are stripped in the float and double versions,
   including in the committed versions.  But the normal style is only
   strip large comments and adjust small comments in the float and double
   versions.  My clog*() handles this problem in the opposite way not stripping
   any comments and minimizing differences in comments.  It also minimizes
   difference in code by manual inspection/rewrite instead of a template.

Bruce