standards/142803: j0 Bessel function inaccurate near zeros of the function

Tue Feb 23 00:21:23 UTC 2010

Bruce Evans wrote:
> On Thu, 14 Jan 2010, Steven G. Kargl wrote:
> 
> > Bruce Evans wrote:
> >> On Wed, 13 Jan 2010, Steven G. Kargl wrote:
> >>
> >>>> Description:
> >>>
> >>> The j0 bessel function supplied by libm is fairly inaccurate at
> >>> ...
> >>
> >> This is a very hard and relatively unimportant problem.
> >
> > Yes, it is very hard, but apparently you do not use bessel
> > functions in your everyday life. :)
> >
> > I only discover this issue because I need bessel functions
> > of complex arguments and I found my routines have issues
> > in the vicinity of zeros.  So, I decided to look at the
> > libm routines.
> 
> It is interesting that the often-poor accuracy of almost every system's
> libm matters in real life.
> 
> Complex args are another interesting problem since even complex
> multiplication is hard to do accurately (it may have large cancelation
> errors).
> 
> >> Anyway, if you can get anywhere near < 10 ulp error near all zeros using
> >> only an asymptotic method, then that would be good.  Then the asymptotic
> >> method would also be capable of locating the zeros very accurately.  But
> >> I would be very surprised if this worked.  I know of nothing similar for
> >> reducing mod Pi for trigonometric functions, which seems a simpler problem.
> >> I would expect it to at best involve thousands of binary digits in the
> >> tables for the asymptotic method, and corresponding thousands of digits
> >> of precision in the calculation (4000 as for mfpr enough for the 2**100th
> >> zero?).
> >
> > The 4000-bit setting for mpfr was a hold over from testing mpfr_j0
> > against my ascending series implementation of j0 with mpfr
> > primitives.  As few as 128-bits is sufficient to achieve the
> > following:
> >
> >>>    x        my j0f(x)     libm j0f(x)    MPFR j0        my err  libm err
> >    2.404825  5.6434398E-08  5.9634296E-08  5.6434400E-08      0.05 152824.59
> >    5.520078  2.4476657E-08  2.4153294E-08  2.4476659E-08      0.10  18878.52
> >    8.653728  1.0355303E-07  1.0359805E-07  1.0355306E-07      0.86   1694.47
> >   11.791534 -3.5291243E-09 -3.5193941E-09 -3.5301714E-09     75.93    781.53
> 
> Wonder why this jumps.
> 
> >   14.930918 -6.4815082E-09 -6.3911618E-09 -6.4815052E-09      0.23   6722.88
> >   18.071064  5.1532352E-09  5.3149818E-09  5.1532318E-09      0.23  10910.50
> >   21.211637 -1.5023349E-07 -1.5002509E-07 -1.5023348E-07      2.70  56347.01
> > ...
> >
> > As I suspected by adding additional terms to the asymptotic
> > approximation and performing all computations with double
> > precision, reduces 'my err' (5th column).  The value at
> > x=11.7... is the best I can get.  The asymptotic approximations
> > contain divergent series and additional terms do not help.
> 
> The extra precision is almost certainly necessary.  Whether double
> precision is nearly enough is unclear, but the error near 11.7 suggests
> that it is nearly enough except there.  The large error might be caused
> by that zero alone (among small zeros) being very close to a representable
> value.
> 
> I forgot the mention that the error table in my previous mail is on amd64
> for comparing float precision functions with double precision ones, assuming
> that the latter are correct, which they aren't, but they are hopefully
> correct enough for this comparision.  The errors on i386 are much larger,
> due to i386 still using i387 hardware trigonometric which are extremely
> inaccurate near zeros, starting at the first zero.  Here are both tables:
> 
> amd64:
> %%%
> j0:    max_er = 0x7fffffffffdf5e07 17179869183.9960, avg_er = 5.581, #>=1:0.5 = 1593961230:1839722934
> j1:    max_er = 0x7fffffffffbcd2a1 17179869183.9918, avg_er = 4.524, #>=1:0.5 = 1597678928:1856295142
> lgamma:max_er = 0x135a0b77e00000 10145883.7461, avg_er = 0.252, #>=1:0.5 = 44084256:331444835
> y0:    max_er = 0x7fffffffffbcd2a0 17179869183.9918, avg_er = 2.379, #>=1:0.5 = 837057577:1437331064
> y1:    max_er = 0x7fffffffffdf5e07 17179869183.9960, avg_er = 3.761, #>=1:0.5 = 865063612:1460264955
> %%%
> 
> i386:
> %%%
> j0:    max_er = 0x6f8686c5c9f26 3654467.3863, avg_er = 0.418, #>=1:0.5 = 671562746:1332944948
> j1:    max_er = 0x449026e9c286f 2246675.4566, avg_er = 0.425, #>=1:0.5 = 674510414:1347770568
> lgamma:max_er = 0xe4cf242400000 7497618.0703, avg_er = 0.274, #>=1:0.5 = 70033452:508702222
> y0:    max_er = 0x93a2340c00000 4837658.0234, avg_er = 0.380, #>=1:0.5 = 594207097:1303826516
> y1:    max_er = 0x7ffa2068256f72ab 17176789825.1699, avg_er = 5.188, #>=1:0.5 = 459137173:1213136103
> %%%
> 
> Unfortunately, most of these i386 errors, and the amd64 error for y1()
> are misreported.  The huge max_er of 0x7ff... (16 hex digits) is
> actually a sign error misreported.  Sign errors are bad enough.  They
> always result at a simple zero z0 (f'(z0) != 0) when the approximation
> is so inaccurate that it cannot tell on which side of infinite-precision
> z0 the parameter is.  Methods involving a table of zeros will not have
> any of these provided the table is accurate to within 1 ulp, but other
> methods can easily have them, depending on the other method's ability
> to locate the zeros to within 1 ulp by solving f~(z) ~= 0 where f~ is
> the approximate f.
> 
> Having no sign errors across the whole range seems too good to believe
> for the amd64 functions.  All of the i387 hardware trig functions have
> sign errors.
> 

Bruce, David,

Since the my real-life work requires additional accuracy for
the Bessel function routines in libm, I spent sometime playing
with e_j0f.  I've come up with the patch that follows my signature.
It only deals with the first 32 zeros of j0(x), others can be
added to the table.  I've yet to come up with a general method
to improve the accuracy around all zeros.   To help you gauge
the improvement, one can look at

http://troutmask.apl.washington.edu/~kargl/j0f.jpg
http://troutmask.apl.washington.edu/~kargl/j0f_1.jpg

which compares the unpatched j0f(x) to the patched j0f(x).

The patch makes use of the addition theorem for J0(x+d) where
x is a zero and d is some small deviation from the zero.  Using
only the first 4 terms is sufficent for j0f.

J0(x+d) = - J1(x)*J1(d) + J2(x)*J2(d) - J3(x)*J3(d) + J4(x)*J4(d)

where J1(d), J2(d), J3(d) and J4(d) can be approximated by 
truncating their series representations.  The functions with x, ie
the known zero, are evaluated with MPFR.  Note, d is determined
from d = z - x = z - high(x) - low(x) where high(x) is the 1st
24 bits of the zero, low(x) is the next 24 bits of zero, and 
z is some value in [x-0.065,x+0.065].

Bruce (and/or David), if you think that this merits inclusion
in FreeBSD's libm, I'll work up a similar patch for e_j0.c and
submit both; otherwise, I keep it as a local change. 

-- 
Steve
http://troutmask.apl.washington.edu/~kargl/


Index: src/e_j0f.c
===================================================================

--- src/e_j0f.c	(revision 204219)
+++ src/e_j0f.c	(working copy)
@@ -22,6 +22,8 @@
 static float pzerof(float), qzerof(float);
 
 static const float
+invpi	= 0.31830988618379067154,
+pi	= 3.14159265358979323846,
 huge 	= 1e30,
 one	= 1.0,
 invsqrtpi=  5.6418961287e-01, /* 0x3f106ebb */
@@ -38,17 +40,157 @@
 
 static const float zero = 0.0;
 
+/* J1(x) for x << 1. */
+static float
+_j1(float x2)
+{
+	return (x2 * (1 - 0.5 * x2 * x2));
+}
+
+/* J2(x) for x << 1. */
+static float
+_j2(float x2)
+{
+	x2 *= x2;
+	return (0.5 * x2 * (1 - x2 / 3));
+}
+
+/* J3(x) for x << 1. */
+static float
+_j3(float x2)
+{
+	float y;
+	y =  x2 * x2;
+	return (x2 * y * (1 - 0.25 * y) / 6);
+}
+
+/* J4(x) for x << 1. */
+static float
+_j4(float x2)
+{
+	x2 *= x2;
+	return (x2 * x2 / 24);
+}
+
+/*
+ * Use the addition theorem for values near a zero the Bessel function.
+ * J0(z) = J0(x + d) where |d| << 1.
+ *       = - 2*J1(x)*J1(d) + 2*J2(x)*J2(d) - ...
+ * Note, the factor of 2 is included in the table.
+ */
 float
+j0zerof(int n, float z)
+{
+	static const float x[32][6] = {
+	    {0x1.33d152p+1, 0x1.d2e368p-24, 0x1.09cdb4p+0, 0x1.ba1deep-1,
+		0x1.978d44p-2, 0x1.0933ccp-3},
+	    {0x1.6148f6p+2, -0x1.34f46ep-24, -0x1.5c6e6p-1, -0x1.f8f72ep-3,
+		0x1.00f404p-1, 0x1.9588cep-1},
+	    {0x1.14eb56p+3, 0x1.999bdap-22, 0x1.15f798p-1, 0x1.00f7fcp-3,
+		-0x1.f08b9p-2, -0x1.d8c2a8p-2},
+	    {0x1.79544p+3, 0x1.04e56cp-26, -0x1.dc13e6p-2, -0x1.42ff0cp-4,
+		0x1.c0af7ep-2, 0x1.350edcp-2},
+	    {0x1.ddca14p+3, -0x1.0d8e2ep-25, 0x1.a701dp-2, 0x1.c54b94p-5,
+		-0x1.97d3ccp-2, -0x1.b9186p-3},
+	    {0x1.212314p+4, -0x1.d7982ap-26, -0x1.8077f6p-2, -0x1.5467ecp-5,
+		0x1.770cdp-2, 0x1.4e26dp-3},
+	    {0x1.5362dep+4, -0x1.d1810ep-21, 0x1.62d93ap-2, 0x1.0ba9cep-5,
+		-0x1.5c8a0ap-2, -0x1.08180cp-3},
+	    {0x1.85a3bap+4, -0x1.9fd524p-21, -0x1.4b2a2ep-2, -0x1.b3298p-6,
+		0x1.46b28cp-2, 0x1.aec26ap-4},
+	    {0x1.b7e54ap+4, 0x1.7f57c4p-22, 0x1.37aac8p-2, 0x1.6ac0d2p-6,
+		-0x1.345e5cp-2, -0x1.67dfb2p-4},
+	    {0x1.ea275ap+4, -0x1.c68826p-21, -0x1.27407ep-2, -0x1.34695p-6,
+		0x1.24bc2ep-2, 0x1.32708ap-4},
+	    {0x1.0e34e2p+5, -0x1.8b3204p-20, 0x1.192f24p-2, 0x1.0a6682p-6,
+		-0x1.17365ap-2, -0x1.08ffd2p-4},
+	    {0x1.275638p+5, -0x1.5a7986p-21, -0x1.0cf3eep-2, -0x1.d242aap-7,
+		0x1.0b5fc4p-2, 0x1.d0352cp-5},
+	    {0x1.4077a8p+5, -0x1.29d6c6p-23, 0x1.0230bap-2, 0x1.9c8084p-7,
+		-0x1.00e734p-2, -0x1.9af5aap-5},
+	    {0x1.59992cp+5, 0x1.974364p-21, -0x1.f13fbp-3, -0x1.70558ep-7,
+		0x1.ef1ep-3, 0x1.6f2666p-5},
+	    {0x1.72bacp+5, 0x1.f02102p-20, 0x1.e018dap-3, 0x1.4b858ap-7,
+		-0x1.de4fp-3, -0x1.4a986ap-5},
+	    {0x1.8bdc62p+5, 0x1.27e0cap-20, -0x1.d09b22p-3, -0x1.2c74f6p-7,
+		0x1.cf1686p-3, 0x1.2bb87cp-5},
+	    {0x1.a4fe0ep+5, 0x1.c8899p-20, 0x1.c28612p-3, 0x1.11f526p-7,
+		-0x1.c138e4p-3, -0x1.115d32p-5},
+	    {0x1.be1fc4p+5, 0x1.a4c606p-23, -0x1.b5a622p-3, -0x1.f645fep-8,
+		0x1.b485eap-3, 0x1.f54de8p-6},
+	    {0x1.d7418p+5, 0x1.93c83ep-20, 0x1.a9d184p-3, 0x1.cea254p-8,
+		-0x1.a8d632p-3, -0x1.cdd58ap-6},
+	    {0x1.f06344p+5, -0x1.7b4716p-22, -0x1.9ee5eep-3, -0x1.abf28ap-8,
+		0x1.9e093ap-3, 0x1.ab47cep-6},
+	    {0x1.04c286p+6, 0x1.0f88f2p-21, 0x1.94c6f6p-3, 0x1.8d6372p-8,
+		-0x1.9403e4p-3, -0x1.8cd3dp-6},
+	    {0x1.11536cp+6, 0x1.645ae6p-19, -0x1.8b5ccap-3, -0x1.724d02p-8,
+		0x1.8aaf6p-3, 0x1.71d33p-6},
+	    {0x1.1de456p+6, -0x1.6bc7a4p-19, 0x1.829356p-3, 0x1.5a280ep-8,
+		-0x1.81f85cp-3, -0x1.59bff8p-6},
+	    {0x1.2a754p+6, -0x1.5f7eep-20, -0x1.7a597ep-3, -0x1.4486cp-8,
+		0x1.79ce5p-3, 0x1.442d38p-6},
+	    {0x1.37062cp+6, -0x1.ab28bap-20, 0x1.72a09ap-3, 0x1.310f06p-8,
+		-0x1.72230ep-3, -0x1.30c186p-6},
+	    {0x1.439718p+6, 0x1.c5c6f4p-19, -0x1.6b5c04p-3, -0x1.1f766p-8,
+		0x1.6aea5p-3, 0x1.1f32e8p-6},
+	    {0x1.502808p+6, -0x1.2cbbcep-19, 0x1.6480c4p-3, 0x1.0f7eb8p-8,
+		-0x1.641964p-3, -0x1.0f43acp-6},
+	    {0x1.5cb8f8p+6, -0x1.f0c70ap-19, -0x1.5e0544p-3, -0x1.00f3f2p-8,
+		0x1.5da6f4p-3, 0x1.00c004p-6},
+	    {0x1.6949e8p+6, -0x1.81383ep-20, 0x1.57e12p-3, 0x1.e75414p-9,
+		-0x1.578accp-3, -0x1.e6f852p-7},
+	    {0x1.75dadap+6, -0x1.cea80cp-19, -0x1.520ceep-3, -0x1.cef722p-9,
+		0x1.51bdacp-3, 0x1.cea5bap-7},
+	    {0x1.826bccp+6, -0x1.46c93cp-19, 0x1.4c8222p-3, 0x1.b89114p-9,
+		-0x1.4c392ap-3, -0x1.b8489p-7},
+	    {0x1.8efcbep+6, 0x1.615096p-20, -0x1.473ae6p-3, -0x1.a3eaeap-9,
+		0x1.46f78ap-3, 0x1.a3aa16p-7},
+	};
+
+	float j, d, d2;
+	d = z - x[n][0] - x[n][1];
+	d2 = d / 2;
+	j =  - x[n][2] * _j1(d2) + x[n][3] * _j2(d2) - 
+	       x[n][4] * _j3(d2) + x[n][5] * _j4(d2);
+	return (j);
+}
+
+/*
+ * The zeros of J0(x) fall near x = (n + 0.75) * pi.  These are offsets
+ * that are used to localize a value of x into a small interval about a 
+ * zero.
+ */
+static const float offset[32] = {
+    4.86309e-02, 2.22910e-02, 1.43477e-02, 1.05619e-02,
+    8.35263e-03, 6.90623e-03, 5.88708e-03, 5.12924e-03,
+    4.54305e-03, 4.07894e-03, 3.70066e-03, 3.38531e-03,
+    3.11957e-03, 2.89196e-03, 2.69488e-03, 2.52450e-03,
+    2.37319e-03, 2.24095e-03, 2.12016e-03, 2.01463e-03,
+    1.91673e-03, 1.82645e-03, 1.75144e-03, 1.67643e-03,
+    1.60904e-03, 1.54166e-03, 1.48953e-03, 1.43740e-03,
+    1.38528e-03, 1.34078e-03, 1.29628e-03, 1.25179e-03
+};
+
+
+float
 __ieee754_j0f(float x)
 {
 	float z, s,c,ss,cc,r,u,v;
-	int32_t hx,ix;
+	int32_t hx,ix,k;
 
 	GET_FLOAT_WORD(hx,x);
 	ix = hx&0x7fffffff;
 	if(ix>=0x7f800000) return one/(x*x);
 	x = fabsf(x);
 	if(ix >= 0x40000000) {	/* |x| >= 2.0 */
+
+		if (x < 100.) {
+			k = (int)(x * invpi - 0.75);
+			if (fabsf(x - (k + 0.75) * pi - offset[k]) < 0.065)
+				return (j0zerof(k, x));
+		}
+
 		s = sinf(x);
 		c = cosf(x);
 		ss = s-c;