j0 (and y0) in the range 2 <= x < (p/2)*log(2)

Sat Sep 8 18:10:09 UTC 2018

On Sat, 8 Sep 2018, Steve Kargl wrote:

> On Sat, Sep 08, 2018 at 10:09:36PM +1000, Bruce Evans wrote:

>> [even more details removed]

>> I don't like rational function approximations (with a nontrivial
>> denominator), and don't fully understand why they are often more
>> accurate.  Now I suspect that their main advantage is reducing the
>> number of zeros in the approximating function, and that this advantage
>> is especially small here since we use small intervals to avoid zeros
>> in the function being approximated.  Higher degree polys have a higher
>> number of zeros, and these zeros must all be outside of the interval
>> except ones matching the zeros of the function being approximated.
>> Using a rational function reduces the degree, by using poles instead
>> of zeros.  The poles must also be outside of the interval, but this is
>> apparently easier to arrange.
>
> You're not the only one who prefers polynomial approximations
> over rational approximations!  I went looking for information about
> when one method should be used instead of the other.  I've looked
> in standard books in numerical analysis (e.g., Ralston and
> Rabinowitz, Hildebrand, etc).  Never found a simple rule.  I've
> come to the conclusion:  try to find a minimax polynomial first
> and fall back to rational approximation if the polynomial is bad.
>
> Just found https://dlmf.nist.gov/3.11
>
> You might find the Example interesting.

Will check.

>> ...
>> I fear that the recursion is very inaccurate.  It seems to do about n
>> multiplications, so must have an inaccuracy of >= n/2 ulps.
>
> For the regions that I've used recursion (n  and x < 20000), recursion
> works well.  I have a paper by Olver that discussions the stability
> of forward and backwards recursion.  You can also find info at
>
> https://dlmf.nist.gov/10.74#iv

Getting too much to check :-).

> ...
> Here's exhaustive testing of j0f(x) on 2 <= x < 4
>
> mobile:kargl[230] ./testf -m 2 -M 4 -E double -D
> Interval tested: [2.0000e+00,4.0000e+00]
>        ulp <= 0.50:  38.034% ( 3190526) |  38.034% ( 3190526)
> 0.50 <  ulp <  0.55:   3.435% (  288112) |  41.469% ( 3478638)
> 0.55 <= ulp <  0.60:   3.343% (  280402) |  44.811% ( 3759040)
> 0.60 <= ulp <  0.70:   6.358% (  533378) |  51.170% ( 4292418)
> 0.70 <= ulp <  0.80:   5.903% (  495183) |  57.073% ( 4787601)
> 0.80 <= ulp <  0.90:   5.460% (  458035) |  62.533% ( 5245636)
> 0.90 <= ulp <  1.00:   4.960% (  416092) |  67.493% ( 5661728)
> 1.00 <= ulp <  2.00:  26.941% ( 2259993) |  94.434% ( 7921721)
> 2.00 <= ulp <  3.00:   5.057% (  424212) |  99.491% ( 8345933)
> 3.00 <= ulp        :   0.509% (   42675) | 100.000% ( 8388608)
> Max ulp: 5.246863 at 3.98081970e+00

I plugged in my polynomomal approximations with your hi+lo decomposition
in the second interval [2, 2.857].  This reduces the maximum error for j0f
from 896571 ulps on amd64 to 3.8591 ulps on amd64 and 1.7455 ulps on i386.
i386's native extra precision is defeated by C bindings and libm not having
any special support for arches with extra precision.

This mysteriously only increases efficiency by < 10%.  The plugin uses a
slow Horner's method evaluation but should still do less than 1/4 as many
operations.

XX diff -u2 e_j0.c~ e_j0.c
XX --- e_j0.c~	Thu May 30 18:14:16 2013
XX +++ e_j0.c	Sun Sep  9 03:09:25 2018
XX @@ -80,4 +82,25 @@
XX  S04  =  1.16614003333790000205e-09; /* 0x3E1408BC, 0xF4745D8F */
XX 
XX +/*
XX + * XXX: the compiler cannot be trusted to calculate z2hi and z2lo here.
XX + * gcc on i386 and all compilers on systems with long double == double
XX + * will round z2 to double precision, giving a useless decomposition with
XX + * z2lo == 0.
XX + */
XX +#define z2	2.404825557695772768621631879L
XX +static const float
XX +z2hi = z2,
XX +z2lo = z2 - (float)z2,
XX +PS2[] = {
XX + -1.7291506899370193e-1,		/* -0x162214bb2821dd.0p-55 */
XX +  1.3329146107965132e-2,		/*  0x1b4c4fb4f04355.0p-59 */
XX + -3.9698769373352208e-4,		/* -0x1a045929580f67.0p-64 */
XX +  6.4047723657482110e-6,		/*  0x1add126c1f1b21.0p-70 */
XX + -6.5169501554451472e-8,		/* -0x117e69feeaa2ee.0p-76 */
XX +  4.5704675158948688e-10,		/*  0x1f68739484aa44.0p-84 */
XX + -2.3227382988447959e-12,		/* -0x146e57779789e0.0p-91 */
XX +  7.9237009913278836e-15,		/*  0x11d7b3dfda9418.0p-99 */
XX +};
XX +
XX  static const double zero = 0.0;
XX 
XX @@ -106,4 +129,10 @@
XX  	 * y0(x) = 1/sqrt(pi) * (P(0,x)*ss + Q(0,x)*cc) / sqrt(x)
XX  	 */
XX +		if(ix<0x4006db6d) {
XX +		    z = x*x;
XX +		    return (x-z2hi-z2lo) * (x+z2hi) *
XX +		        (PS2[0]+z*(PS2[1]+z*(PS2[2]+z*(PS2[3]+z*(PS2[4]+
XX +			z*(PS2[5]+z*(PS2[6]+z*PS2[7])))))));
XX +		}
XX  		if(ix>0x48000000) z = (invsqrtpi*cc)/sqrt(x);
XX  		else {
XX diff -u2 e_j0f.c~ e_j0f.c
XX --- e_j0f.c~	Thu May 30 18:14:16 2013
XX +++ e_j0f.c	Sun Sep  9 03:02:36 2018
XX @@ -37,4 +43,16 @@
XX  S04  =  1.1661400734e-09; /* 0x30a045e8 */
XX 
XX +#define z2	2.4048255576957728
XX +static const float
XX +z2hi = z2,
XX +z2lo = z2 - (float)z2,
XX +PS2[] = {
XX + -1.72912285e-1,		/* -0xb10feb.0p-26 */
XX +  1.33266943e-2,		/*  0xda5835.0p-30 */
XX + -3.96135118e-4,		/* -0xcfb05b.0p-35 */
XX +  6.25792291e-6,		/*  0xd1fb26.0p-41 */
XX + -5.25139328e-8,		/* -0xe18bae.0p-48 */
XX +};
XX +
XX  static const float zero = 0.0;
XX 
XX @@ -63,5 +81,10 @@
XX  	 * y0(x) = 1/sqrt(pi) * (P(0,x)*ss + Q(0,x)*cc) / sqrt(x)
XX  	 */
XX -		if(ix>0x80000000) z = (invsqrtpi*cc)/sqrtf(x);
XX +		if(ix<0x4036d917) {
XX +		    z = x*x;
XX +		    return (x-z2hi-z2lo) * (x+z2hi) *
XX +			(PS2[0]+z*(PS2[1]+z*(PS2[2]+z*(PS2[3]+ z*PS2[4]))));
XX +		}
XX +		if(ix>0x58000000) z = (invsqrtpi*cc)/sqrtf(x); /* |x|>2**49 */
XX  		else {
XX  		    u = pzerof(x); v = qzerof(x);

>> ...
>> It obviously can't work on an infinite interval, since there are an infinite
>> number of zeros there, but only a finite number pf zeros in R(x) or P(x), and
>> poles in 1/S(x).  It's surprising that the asymptotic expansion seems to
>> find all the zeros of j0f with errors of less than 4 million ulps (according
>> to consistency checks between j0f and j0).
>
> The problem with using an approximation for the infinite product of
> j0(x) is that there doesn't seem to be a similar product for y0(x).
> I haven't comtemplated what to do about y0(x).

y0 just needs different minimax approximations.  Sharing pzero and
qzero might be the main reason for using the asymptotic formula where
it doesn't really apply.

Oops, y2 is neither odd nor even, so it is hard to approximate.  I just
found a good formula for this in the source code (I couldn't find one in
A&S or W&W).  It is y0(x) = 2/pi*(j0(x)*ln(x/2)+even_func(x)).

y0(x) is actually undefined for x < 0.  You added a comment about this
when improving the comments about special cases.

We could use a single poly with odd and even terms to approximate y0
directly, and this is probably better than approximating j0, log and
even_func specially, since all methods have errors of several ulps
unless everything uses extra precision.  The advantage of using
separate approximations is that the log term requires a high degree
poly and/or complications whether it is separate or not, and we already
have good versions of it when it is separate (at least for logl and
for log and logf on i386).  libm does use this formula with its separate
approximations for 0 < x < 2.

Bruce