Implementation of expl()
Bruce Evans
brde at optusnet.com.au
Mon Dec 10 03:22:17 PST 2007
On Sun, 9 Dec 2007, Steve Kargl wrote:
> On Sun, Dec 09, 2007 at 04:25:05PM -0500, David Schultz wrote:
>> On Fri, Nov 02, 2007, Steve Kargl wrote:
>>> With all the success of developing several other missing
>>> C99 math routines, I decided to tackle expl() (which I
>>> need to tackle tanhl()).
>>
>> Hmm, great, but where's the patch? :) Maybe the mailing list
>> software ate it.
>
> This is the current version. I need to revise how I computed
> the ploynomial coefficient. In short, I mapped r in
> [-ln(2)/2:ln(2)/2] into the range x in [-1,1] for the Chebyshev
> interpolation, but I never scaled x back into r. This is the
> reason why the lines "r = r * TOLN2;" exists.
>
> I don't remember if bde sent me comments on this code. I sure
> he has plenty. :)
I don't think I commented about expl() before. Sorry, it's further
from what I would like than the trig functions.
Here are some better polynomial approximations for it.
----------------------------------------------------------------------------
(1) 64-bit precision, approximating exp(), saves a term and should be more
accurate. Note the difference in the plots between the Chebyshev
approximation and the final approximation. The uglyness and extra
error in the latter are due to rounding the very-high-precision
coeffs used in the former to 64 bits. The final approximation
uses a Remez algorithm on rounded coeffs to try to choose the best
set of rounded coeffs (!= the best set of coeffs, rounded). Its
plot is ugly because my algorithm is unstable for functions that
aren't even, so I had to stop it after 1 significant iteration
(adjusting the P3 term from 0xaaaaaaaaaaaaaaab.0p-66 to ...aa8.0p-66).
If it worked properly then it would push all the extrema back to the
top and bottom of the plot, and be slightly more accurate than the
Chebyshev approximation. Remez gains very little over Chebyshev in
infinite precision (about a factor of 2), but can compensate for
huge errors (a factor of 100[0]) due to unfortunate rounding of
coeffs.
----------------------------------------------------------------------------
--- Simple Lagrange interpolation at Chebyshev points ---
5.975e-24 |'''''x'''''''''x'''''''''''x''''''''''''x''''''''''_x''''''_''"
| : x " " :: : :
|_ : : : _ " : " : : : :: :
|: : x : : : : : : : : :: :
|: " : : : : " : " : x : x :
|: : : " : _ : : : " : : : :
|: : : : : : : " : : : : : :
|: : : : " : : : : : : : : :
|: : : : : : x : : : : _ : :
|: : : : : : : : " : : : : :
|:: : x _ : _ : : : : : : ::|
`::`:```:````:`````:`````:``````:`````"`````:`````"````_```:`::`
: : : : : _ : : : : : : : :|
: : " : : : : x : : : : : :|
: : : : : : : : : _ : : : :|
: : : : : : _ : : : : : : :|
: : : : _ : : : " : _ : : :|
: x: " : x : _ : : : : : :|
: :: : : : : : : : : _ " :|
: : :: : x : _ x : : : _|
: _ :_ _ _ _ : |
-5.95e-24 x........."...........x............x...........x........._.....|
-0.34658 0.34658
--- Lagrange/Chebyshev with coeff adjusted to 64-bit precision ---
adjn n = 64 k = 3 j = 120 toter 0
--- rounded coeff ---
P0 = 1.000000000000000000000000000000000000000, /* 0x8000000000000000.0p-63 */
P1 = 1.000000000000000000000000000000000000000, /* 0x8000000000000000.0p-63 */
P2 = 0.5000000000000000000000000000000000000000, /* 0x8000000000000000.0p-64 */
P3 = 0.1666666666666666666310000000000000000000, /* 0xaaaaaaaaaaaaaaa8.0p-66 */
P4 = 0.04166666666666666666100000000000000000000, /* 0xaaaaaaaaaaaaaaa9.0p-68 */
P5 = 0.008333333333333338066950000000000000000000, /* 0x8888888888889e5d.0p-70 */
P6 = 0.001388888888888889569550000000000000000000, /* 0xb60b60b60b60cf28.0p-73 */
P7 = 0.0001984126984124767297770000000000000000000, /* 0xd00d00d00c013acd.0p-76 */
P8 = 0.00002480158730156209466100000000000000000000, /* 0xd00d00d00c1851e1.0p-79 */
P9 = 0.000002755731927361941412700000000000000000000, /* 0xb8ef1d304cd05f90.0p-82 */
P10 = 0.0000002755731927069719307580000000000000000000, /* 0x93f27dbffa11c00d.0p-85 */
P11 = 0.00000002505205082467052704390000000000000000000, /* 0xd7320ad84885e745.0p-89 */
P12 = 0.000000002087671071533106249630000000000000000000, /* 0x8f76b2a8ea87f9b3.0p-92 */
P13 = 1.60924147227467723196999999999999999999 E-10, /* 0xb0f01edf2f5fac20.0p-96 */
P14 = 1.14941936278242808142000000000000000000 E-11, /* 0xca353f02fa6b7741.0p-100 */
7.252e-24 |''''''''''''''''_''''''''''''''''''''_""_'''''''''''''''''''''|
| " x x |
| x _ x _" |
| " x " |
--------------_---------------_xxx"--------x------_-------------
| _x : " _" _ |
| : : _ x x _ : |
| : " _ x : |
| _ : "__x x _ : x |
| : _ : x_ x : |
| : " : " : |
|_ : xx : : x |
|: : _ : : x
|: x " : :
|:: : x : :
|::: ::
: _: :|
: _ :|
: :|
: x|
: |
-2.79e-23 _..............................................................|
-0.34658 0.34658
adjn n = 64 k = 3 toter 0
----------------------------------------------------------------------------
(2) 113-bit precision, approximating exp().
----------------------------------------------------------------------------
--- Simple Lagrange interpolation at Chebyshev points ---
3.239e-38 x''''x'''''x''''''_''''''''x''''''''"''''''''_''''''"'''''"''''"
: : : :x : : x: : : |
|: : :: :: : _ _ : :: :: : _|
|" : _ : : : " : : " : : : _ : :|
|: : : : : : : : : : : : : : : : : :|
|: : : : x : : : : : : : : x : : : :|
|: : : : : " : : : : : : " : : : : :|
| : : _ : : : " : : : : " : : : x : : |
| : _ : : : : : : : : : : : : : : _ : |
| : : : : : : : _ " " _ : : : : : : : |
| : : : : : : : : : : : : : : : : : : |
``x`:`:``:```:``:````:```:```:````:```:```:````:``:```:``:`:`x``
| : : : : : : : : : : : : : : : : : : |
| : : : x : : : : : : : : : : x : : : |
| : : : : x x : : : : : : x _ : : : : |
| :: : : : : " : : : : " : : : : :: |
| : : : : : : : x x : : : : : : : |
| : :: : : : " : : " : : : :: : |
| : _: :: : : : : : : :: :: : |
| : :: :: :: :: :: :: :" : |
| : : :_ _: :: :_ :: : : |
-3.13e-38 |..x....x....."........x.......xx......._......."".....x...._..|
-0.34658 0.34658
--- Lagrange/Chebyshev with coeff adjusted to 113-bit precision ---
adjn n = 113 k = 1 j = 120 toter 0
--- rounded coeff ---
P0 = 1.000000000000000000000000000000000000000, /* 0x10000000000000000000000000000.0p-112 */
P1 = 1.000000000000000000000000000000000000000, /* 0x10000000000000000000000000000.0p-112 */
P2 = 0.4999999999999999999999999999999999500000, /* 0x1ffffffffffffffffffffffffffff.0p-114 */
P3 = 0.1666666666666666666666666666666666600000, /* 0x15555555555555555555555555555.0p-115 */
P4 = 0.04166666666666666666666666666668415500000, /* 0x155555555555555555555555560af.0p-117 */
P5 = 0.008333333333333333333333333333336938300000, /* 0x11111111111111111111111111a6d.0p-119 */
P6 = 0.001388888888888888888888888886437923800000, /* 0x16c16c16c16c16c16c16c15fa9389.0p-122 */
P7 = 0.0001984126984126984126984126980627689100000, /* 0x1a01a01a01a01a01a01a0191e8217.0p-125 */
P8 = 0.00002480158730158730158730175735685579700000, /* 0x1a01a01a01a01a01a01ad93230fd1.0p-128 */
P9 = 0.000002755731922398589065255750604381149600000, /* 0x171de3a556c7338faac28603831fb.0p-131 */
P10 = 0.0000002755731922398589065187949851278033800000, /* 0x127e4fb7789f5c72e69d53c0ac1d3.0p-134 */
P11 = 0.00000002505210838544171877444493611961585000000, /* 0x1ae64567f544e38fdb412a0eb7de0.0p-138 */
P12 = 0.000000002087675698786810064988525091623603400000, /* 0x11eed8eff8d8981cadc22e13e3610.0p-141 */
P13 = 1.60590438368216158655710381729946340000 E-10, /* 0x16124613a86d09fa08f48ca8c4d5e.0p-145 */
P14 = 1.14707455977270929588227069467205860000 E-11, /* 0x193974a8c07640267fec271f370a0.0p-149 */
P15 = 7.6471637318180850420027983995938208999 E-13, /* 0x1ae7f3e733b16c573381f7ce23115.0p-153 */
P16 = 4.77947733504221136080232269675803310000 E-14, /* 0x1ae7f3e773e8b2e2b7a23c9a525ff.0p-157 */
P17 = 2.81145725589067097361019490883767970000 E-15, /* 0x1952c7706c73b79a1577d9ebcf121.0p-161 */
P18 = 1.56191903751316458500805670560905000000 E-16, /* 0x168276d284fb449ca1ff48bfbb891.0p-165 */
P19 = 8.2206265778415557385251561200634072999 E-18, /* 0x12f499f725c9f790053993c3f2b17.0p-169 */
P20 = 4.11616915419431123065482372671396870000 E-19, /* 0x1e5f394471e23048e2a82e8f22b34.0p-174 */
P21 = 1.9600698694144411728713343986 E-20, /* 0x1723f29b62196e8abacb3c5c40000.0p-178 */
9.874e-37 "''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''|
|_ |
| |
| _ |
| : |
| : |
| "x |
| " |
| _ |
| |
| _ |
| |
| "x_ |
| x |
| x |
| _ _x
| x_ x |
| "_ xx _" x |
| """x " "x |
| x x"x___" |
| " __ __ _" |
0 ,,,,,,,,,,,,,,,,,,,,,,"xxx",,"x__x",,"x__x,,,,,,,,,,,,,,,,,,,,,,
-0.34658 0.34658
adjn n = 113 k = 1 toter 0
----------------------------------------------------------------------------
(3) 64-bit precision, approximating nqexp() := x * (exp(x) + 1) / (exp(x) - 1)
exp(x) can be recovered from nqexp(), and the latter has much better
numerical behaviour. E.g., it needs only 7 coeffs instead of 15.
The Remez search completed and equalized all the extrema. It's
interesting that The Chevyshev approximation equalized the extrema
for exp() but not for nqexp().
----------------------------------------------------------------------------
--- Simple Lagrange interpolation at Chebyshev points ---
2.327e-21 "''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''"
: :
: _" "_ :
: :: :: :
: :: :: :
: : : "" "" : : :
: : : x _ _ x : : :
: : x : : : : x : :
: : : : : : : : : :
: : : _ " __ __ " _ : : :
: : : : " "x_ _x" " : : : :
`:_```:```:``````x``````"``````""``````"``````x``````:```:```_:`
|:: : : x x : : ::|
|:: _ _ x _ _ x _ _ ::|
|:: : : _ _ _ _ : : ::|
|:: : : " " : : ::|
|: : : : : :|
|: : x x : :|
|: x x :|
|: x x :|
|: :|
-2.23e-21 |_............................................................_|
-0.34658 0.34658
--- Lagrange/Chebyshev with coeff adjusted to 64-bit precision ---
adjn n = 64 k = 13 toter 0
C-1 = 2.000000000000000000000000000000000000000, /* 0x8000000000000000.0p-62 */
C0 = 0.1666666666666666660071100000000000000000, /* 0xaaaaaaaaaaaaaa7a.0p-66 */
C1 = -0.002777777777777671015759280000000000000000, /* -0xb60b60b60b5904a2.0p-72 */
C2 = 0.00006613756613181145420083840000000000000000, /* 0x8ab355dfd4d5cfe3.0p-77 */
C3 = -0.000001653439010798086927524850000000000000000, /* -0xddebbb58747e58c9.0p-83 */
C4 = 0.00000004175172569487852482302880000000000000000, /* 0xb35282048125b16f.0p-88 */
C5 = -0.000000001045797728772232364397340000000000000000, /* -0x8fbbbc85f0e61575.0p-93 */
1.275e-21 "''"''''''''"''''''''''''"x''''''''''x"''''''''''''"''''''''"''"
: : x _ _ x : :
: :: " : : " " : : " :: :
: :x : : : : : : x: :
: : : : : : x x : : : : : :
: : : : x " : : " x : : : :
: : : _ : : : : : : _ : : :
: : : : : : " " : : : : : :
: " : : : : : : : : " :
: : : : : x x x x : : : : :
|:: : : _ : __ : _ : : ::|
`::``_```:`````:``````:``````````````````:``````:`````:```_``::`
|:: : : : : : : : : ::|
|:: : x : : : : x : ::|
|: : : : " " : : : :|
|: : : : : : : : : :|
|: : : " : : " : : :|
|: : : : : : : : : :|
|: : : : " " : : : :|
|: " " : : : : " " :|
|_ " : : " _|
-1.28e-21 |......_.........._"........................"_.........._......|
-0.34658 0.34658
adjn n = 64
----------------------------------------------------------------------------
(4) 113-bit precision, approximating nqexp(). The Remez search was stable
but made negative progress after the C3 (?) term so I stopped it early.
It had only almost equalized the extrema when stopped.
----------------------------------------------------------------------------
--- Simple Lagrange interpolation at Chebyshev points ---
4.567e-37 |''''''"''''''''''''''''''''''''''''''''''''''''''''''''"''''''|
| " : : " |
| : : _ _ : : |
| :: : : :_ _: : : :: |
| :: : : :: :: : : :: |
| x: : " : : x x : : " : :x |
| :: _ : : : x x : : : _ :: |
| :: : : " : : " " : : " : : :: |
| :: : : : _ : __ __ : _ : : : :: |
--:-:-:-:---:--:---x---x---_--"xx"--_---x---x---:--:---:-:-:-:--
| : : : : : : : : : : : : : : |
| : : : : : : : x " " x : : : : : : : |
| : :: : : : : " " : : : : :: : |
|: :: : : x " " x : : :: :|
|: :: " " " " :: :|
|: :: : : " " : : :: :|
|: _: : : :_ :|
|: : _ _ : :|
|: " " :|
|: :|
|" "|
-5.88e-37 _.............................................................._
-0.34658 0.34658
--- Lagrange/Chebyshev with coeff adjusted to 113-bit precision ---
adjn n = 113 k = 3 j = 120 toter 0
--- rounded coeff ---
C-1 = 2.000000000000000000000000000000000000000, /* 0x10000000000000000000000000000.0p-111 */
C0 = 0.1666666666666666666666666666666662000000, /* 0x15555555555555555555555555542.0p-115 */
C1 = -0.002777777777777777777777777777553326200000, /* -0x16c16c16c16c16c16c16c16b85140.0p-121 */
C2 = 0.00006613756613756613756613752850992610100000, /* 0x11566abc011566abc0114a7e047ee.0p-126 */
C3 = -0.000001653439153439153439150292458494169500000, /* -0x1bbd779334ef0aac6588d54fb28a1.0p-132 */
C4 = 0.00000004175351397573619780544017497215914800000, /* 0x166a8f2bf70ebd9cab06da3dfcbe9.0p-137 */
C5 = -0.000000001056838027737493956831265550261534200000, /* -0x122805d6442668558fc0ca2fcc585.0p-142 */
C6 = 2.67650730612754829770358952948658760000 E-11, /* 0x1d6db2c4e01fe52c6eb7674cf86f8.0p-148 */
C7 = -6.7793605801133740547903998874808967000 E-13, /* -0x17da4e1ebc3556f34306a709a13c6.0p-153 */
C8 = 1.71721130775699965239461146049180330000 E-14, /* 0x1355864cf343fe6e71f6644193ab0.0p-158 */
C9 = -4.34912151080995129543227548203712180000 E-16, /* -0x1f56b690b4b95dd505b8a8e6666b1.0p-164 */
C10 = 1.08203893915267000322982006799182099999 E-17, /* 0x18f33b03a36ff97329d6990c65e9c.0p-169 */
3.741e-37 |''''''"''''''''''''''''''''''''''''''''''''''''''''''''"''''''|
| : _ _ : |
| :: : _ " " _ : :: |
| :: :_ ": :" _: : : |
| _ x : : : :: " " " " :: : : : x _ |
| : : : : : : : : : : : : : : : : : : |
| :" : : _ : : : : : : : : : : x _ : ": |
| :: : " : : : x : : : : x : : : : : :: |
| :: : : : : : : : " " : : : : : : : :: |
| :: : : : : " : x : : x : " : : : : :: |
| :: : : : : : : : :: : : : : :: : :: |
--::-:---::---x---:---:---:----""----:---:---:---x---::---:-::--
| :: : :: : : : : : : : : :: : :: |
|: : : :" : : : : : : : : "_ : : :|
|: :: " : : _ : : _ : : :: :|
|: :: : : : : : : : : :: :|
|: :: : x : " " : x : :: :|
|: :_ : : : : : : : : _: :|
|: :: _: :: :: :_ :: :|
|: : : _: :_ : : :|
|: : _ x x _ : :|
-3.37e-37 _x.."......................................................"..x_
-0.34658 0.34658
adjn n = 113 k = 3 toter 0
---
Attempts to use 64-bit coeffs for the 113-bit approximation were
unsuccessful -- the error was about 5e-24, but about 2^-(113+5) =
3e-36 is needed. The interval is just too wide for this to work.
Probably only the first couple of coeffs need to have full precision.
Bruce
More information about the freebsd-standards
mailing list