What to do about tgammal?

From: Steve Kargl <sgk_at_troutmask.apl.washington.edu>
Date: Sat, 04 Dec 2021 18:53:52 UTC
What to do about tgammal?

A long time ago (2013-09-06), theraven@ committed a kludge that mapped
several missing long double math functions to double math functions
(e.g., tanhl(x) was mapped to tanh(x)).  Over the next few years, I
(along with bde and das reviews) provided Intel 80-bit (ld80) and IEEE
128-bit (ld128) implementations for some of these functions; namely,
coshl(x), sinhl(x), tanhl(x), erfl(x), erfcl(x), and lgamma(x).  The
last remaining function is tgammal(x).  If one links a program that uses
tgammal(x) with libm, one sees

  /usr/local/bin/ld: fcn_list.o: in function `build_fcn_list':
  fcn_list.c:(.text+0x7c4): warning: tgammal has lower than advertised
  precision

The warning is actually misleading.  Not only does tgammal(x) have a
*MUCH* lower precision, it has a reduced domain.  That is, tgammal(x)
produces +inf for x > 172 whereas tgammal(x) should produce a finite
result for values of x up to 1755 (or so).  On amd64-*-freebsd,
testing 1000000 in the below intervals demonstrates pathetic accuracy.

Current implmentation via imprecise.c

  Interval         | Max ULP
-------------------+------------
 [6,171]           |  1340542.2
 [1.0662,6]        |    14293.3
 [1.01e-17,1.0661] |     3116.1
 [-1.9999,-1.0001] | 15330369.3
-------------------+------------

Well, I finally have gotten around to removing theraven@'s last kludge
for FreeBSD on systems that support ld80.  This is done with a straight
forward modification of the msun/bsdsrc code.  The limitation on
domain is removed and the accuracy substantially improved. 

  Interval         | Max ULP
-------------------+----------
 [6,1755]          |    8.457
 [1.0662,6]        |   11.710
 [1.01e-17,1.0661] |   11.689
 [-1.9999,-1.0001] |   11.871
-------------------+----------

My modifications leverage the fact that tgamma(x) (ie., double function)
uses extend arithmetic to do the computations (approximately 85 bits of
precision).  To get the Max ULP below 1 (the desired upper limit), a few
minimax polynomials need to be determined and the mystery around a few
magic numbers need to be unraveled.

Extending what I have done to an ld128 implementation requires much
more effort than I have time and energy to pursue.  Someone with 
interest in floating point math on ld128 system can provide an 
implementation.

So, is anyone interested in seeing a massive patch?

-- 
Steve