Re: What to do about tgammal?

From: Steve Kargl <sgk_at_troutmask.apl.washington.edu>
Date: Sat, 04 Dec 2021 21:20:23 UTC
On Sat, Dec 04, 2021 at 08:40:56PM +0100, Hans Petter Selasky wrote:
> On 12/4/21 19:53, Steve Kargl wrote:
> > What to do about tgammal?

(trim some history)

> > 
> >    Interval         | Max ULP
> > -------------------+------------
> >   [6,171]           |  1340542.2
> >   [1.0662,6]        |    14293.3
> >   [1.01e-17,1.0661] |     3116.1
> >   [-1.9999,-1.0001] | 15330369.3
> > -------------------+------------
> > 
> > Well, I finally have gotten around to removing theraven@'s last kludge
> > for FreeBSD on systems that support ld80.  This is done with a straight
> > forward modification of the msun/bsdsrc code.  The limitation on
> > domain is removed and the accuracy substantially improved.
> > 
> >    Interval         | Max ULP
> > -------------------+----------
> >   [6,1755]          |    8.457
> >   [1.0662,6]        |   11.710
> >   [1.01e-17,1.0661] |   11.689
> >   [-1.9999,-1.0001] |   11.871
> > -------------------+----------
> > 
> > My modifications leverage the fact that tgamma(x) (ie., double function)
> > uses extend arithmetic to do the computations (approximately 85 bits of
> > precision).  To get the Max ULP below 1 (the desired upper limit), a few
> > minimax polynomials need to be determined and the mystery around a few
> > magic numbers need to be unraveled.
> > 
> > Extending what I have done to an ld128 implementation requires much
> > more effort than I have time and energy to pursue.  Someone with
> > interest in floating point math on ld128 system can provide an
> > implementation.
> > 
> > So, is anyone interested in seeing a massive patch?
> > 
> 
> Hi,
> 
> Do you need a implementation of tgamma() which is 100% correct, or a
> so-called speed-hack version of tgamma() which is almost correct?
> 
> I've looked a bit into libm in FreeBSD and I see some functions are
> implemented so that they execute quickly, instead of producing exact
> results. Is this true?
> 

I'm afraid that I don't fully understand your questions.

The ULP, listed above, were computed by comparing the libm tgammal(x)
against a tgammal(x) computed with MPFR.  The MPFR result was configured
to have 256 bits of precision.  In other words, MPFR is assumed to be
exact for the comparison between a 64-bit tgammal(x) and a 256-bit
mpfr_gamma() function.

There is no speed hack with mpfr_gamma().

% time ./tlibm_lmath -l -s 0 -x 6 -X 1755 -n 100000 tgamma
Interval tested for tgammal: [6,1755]
100000 calls, 0.042575 secs, 0.42575 usecs/call
count: 100000
  xmu = LD80C(0xae3587b6f275c42c,     4,  2.17761377613776137760e+01L),
libmu = LD80C(0xb296591784078768,    64,  2.57371418855839536160e+19L),
mpfru = LD80C(0xb296591784078760,    64,  2.57371418855839536000e+19L),
 ULP = 8.28349
        6.04 real         6.02 user         0.01 sys

My test program shows 100000 libm tgammal(x) calls took about 0.04
seconds while the program takes 6 seconds to finish.  Most of that
time is dominated by MPFR.

In general, floating point arithmetic, where a finite number is
the result, is inexact.  The basic binary operators, +x-/*, are
specified by IEEE 754 to have an error no larger that 0.5 ULP.

The mantra that I follow (and know bde followed) is to try
to optimize libm functions to give the most accurate result
as fast as possible.

-- 
Steve