i386/67469: src/lib/msun/i387/s_tan.S gives incorrect results for large inputs

Sat Feb 5 16:09:27 PST 2005

On Sat, Feb 05, 2005, Bruce Evans wrote:
> Better call the MI tan() to do all this.  It won't take significantly
> longer, and shouldn't be reached in most cases anyway.

Yeah, but s_tan.S overrides s_tan.c, so that would require extra
machinery.  Also, if fptan had worked as advertised and correctly
identified the cases where range reduction was necessary, calling
rem_pio2() directly would have avoided superfluous range checking.

> I think it is because fptan is inherently inaccurate.  [...]

As you mention, it gets the wrong answer for M_PI.  In general, it
seems to do pretty badly on numbers close to multiples of pi.
Granted, one could argue that the answer returned is going to be
even *further* from the answer the programmer expected (namely, 0),
so it won't make much difference. ;-)

BTW, Kahan has a program for producing large floating-point
numbers close to multiples of pi/2:
	http://www.cs.berkeley.edu/~wkahan/testpi/

In investigating this, I discovered several interesting things:

- The Intel software developer's guide says that fptan has an error
  of at most 1 ulp, but as we've seen, they're lying.

- Maple 9.5 on sparc64 with Digits:=5000 corroborates fdlibm's answer
  to tan(M_PI).

- fdlibm's tan() does not seem to be much slower than fptan
  on the range (-M_PI,M_PI), which is what people are most
  likely to care about.  The MD version is faster for some
  inputs where the small angle approximation applies
  (|x| < 2^-28 implies x == tan(x)). fdlibm special cases
  this, too, but the special case isn't early enough, and the
  bugfix in fdlibm 5.3 slows things down.  Throwing that
  special case out of the average, fdlibm seems to be faster!
  (NB: I benchmarked this very sloppily on ref4.  Perhaps
  you could confirm the results...)

Your suggestion of effectively special-casing large inputs and
inputs close to multiples of pi is probably the right way to fix
the inaccuracy.  However, I'm worried that it would wipe away
any performance benefit of using fptan, if there is a benefit
at all.  How about punting and removing s_tan.S instead?

Other unanswered questions (ENOTIME right now):
- What about sin() and cos()?
- What about the float versions of these?