i386/67469: src/lib/msun/i387/s_tan.S gives incorrect results for large inputs

Sat Feb 5 16:10:22 PST 2005

The following reply was made to PR i386/67469; it has been noted by GNATS.

From: David Schultz <das at FreeBSD.ORG>
To: Bruce Evans <bde at zeta.org.au>
Cc: FreeBSD-gnats-submit at FreeBSD.ORG, freebsd-i386 at FreeBSD.ORG
Subject: Re: i386/67469: src/lib/msun/i387/s_tan.S gives incorrect results for large inputs
Date: Sat, 5 Feb 2005 19:09:12 -0500

 On Sat, Feb 05, 2005, Bruce Evans wrote:
 > Better call the MI tan() to do all this.  It won't take significantly
 > longer, and shouldn't be reached in most cases anyway.

 Yeah, but s_tan.S overrides s_tan.c, so that would require extra
 machinery.  Also, if fptan had worked as advertised and correctly
 identified the cases where range reduction was necessary, calling
 rem_pio2() directly would have avoided superfluous range checking.

 > I think it is because fptan is inherently inaccurate.  [...]

 As you mention, it gets the wrong answer for M_PI.  In general, it
 seems to do pretty badly on numbers close to multiples of pi.
 Granted, one could argue that the answer returned is going to be
 even *further* from the answer the programmer expected (namely, 0),
 so it won't make much difference. ;-)

 BTW, Kahan has a program for producing large floating-point
 numbers close to multiples of pi/2:
 	http://www.cs.berkeley.edu/~wkahan/testpi/

 In investigating this, I discovered several interesting things:

 - The Intel software developer's guide says that fptan has an error
   of at most 1 ulp, but as we've seen, they're lying.

 - Maple 9.5 on sparc64 with Digits:=5000 corroborates fdlibm's answer
   to tan(M_PI).

 - fdlibm's tan() does not seem to be much slower than fptan
   on the range (-M_PI,M_PI), which is what people are most
   likely to care about.  The MD version is faster for some
   inputs where the small angle approximation applies
   (|x| < 2^-28 implies x == tan(x)). fdlibm special cases
   this, too, but the special case isn't early enough, and the
   bugfix in fdlibm 5.3 slows things down.  Throwing that
   special case out of the average, fdlibm seems to be faster!
   (NB: I benchmarked this very sloppily on ref4.  Perhaps
   you could confirm the results...)

 Your suggestion of effectively special-casing large inputs and
 inputs close to multiples of pi is probably the right way to fix
 the inaccuracy.  However, I'm worried that it would wipe away
 any performance benefit of using fptan, if there is a benefit
 at all.  How about punting and removing s_tan.S instead?

 Other unanswered questions (ENOTIME right now):
 - What about sin() and cos()?
 - What about the float versions of these?