i386/67469: src/lib/msun/i387/s_tan.S gives incorrect results for large inputs

Sat Feb 5 23:40:13 PST 2005

The following reply was made to PR i386/67469; it has been noted by GNATS.

From: Bruce Evans <bde at zeta.org.au>
To: David Schultz <das at freebsd.org>
Cc: FreeBSD-gnats-submit at freebsd.org, freebsd-i386 at freebsd.org
Subject: Re: i386/67469: src/lib/msun/i387/s_tan.S gives incorrect results
 for large inputs
Date: Sun, 6 Feb 2005 18:32:38 +1100 (EST)

 On Sat, 5 Feb 2005, David Schultz wrote:

 > On Sat, Feb 05, 2005, Bruce Evans wrote:
 > > Better call the MI tan() to do all this.  It won't take significantly
 > > longer, and shouldn't be reached in most cases anyway.
 >
 > Yeah, but s_tan.S overrides s_tan.c, so that would require extra
 > machinery.  Also, if fptan had worked as advertised and correctly
 > identified the cases where range reduction was necessary, calling
 > rem_pio2() directly would have avoided superfluous range checking.

 I hacked the file (but not the machinery) easily enough for testing.

 fptan advertises to work? :-)  At least the amd64 manual (instruction
 reference) only claims that it sets C2 if the arg is outside the range
 [-2^63,2^63].

 > > I think it is because fptan is inherently inaccurate.  [...]
 >
 > As you mention, it gets the wrong answer for M_PI.  In general, it
 > seems to do pretty badly on numbers close to multiples of pi.
 > Granted, one could argue that the answer returned is going to be
 > even *further* from the answer the programmer expected (namely, 0),
 > so it won't make much difference. ;-)

 Too bad the correct answer is a little further from 0.

 > ...
 > - The Intel software developer's guide says that fptan has an error
 >   of at most 1 ulp, but as we've seen, they're lying.

 The amd64 manual (application refrence) says "x87 computations are carried
 out in double extended precision format, so that the transcendental
 functions [are accurate to 1 ulp]".  The "so that" part doesn't follow.
 We are seeing something like naive extended precision computations and how
 inaccurate they can be even when only a double precision result is wanted.

 > [... too much detail to reply to :-)]
 > Your suggestion of effectively special-casing large inputs and
 > inputs close to multiples of pi is probably the right way to fix
 > the inaccuracy.  However, I'm worried that it would wipe away
 > any performance benefit of using fptan, if there is a benefit
 > at all.  How about punting and removing s_tan.S instead?

 The problem affects at least sin() and cos() too, so I think throwing
 away the optimized versions is too extreme.  Perhaps a single range
 check to let the MD version handle only values near 0 ([-pi/2+eps,
 pi/2-eps]? would be efficient enough.

 > Other unanswered questions (ENOTIME right now):
 > - What about sin() and cos()?
 > - What about the float versions of these?

 I tested sin().  It misbehaves similarly (with identical results for
 args (2 * n * M_PI).

 It's interesting that there is fptan and not ftan, but fsin and not fpsin.
 Both are partial.

 I don't run -current and hadn't seen your code for the MD float versions...
 They are buggier:
 (1) the exponent can be up to 127, so more than half of the representable
     values exceed 2^63 in magnitude and thus need range reduction.  Results
     for tan(0x1p64) using various methods:

       buggy MD tanf:   0x1p+64                 1.84467e+19
       buggy MD tan:   -0x1.8467b926c834bp-2   -0.379302
       fdlibm MI tanf: -0x1.82bee6p-6          -0.0236051
       bc s(x)/c(x) (scale=40):                 -.02360508353334969937000...
       bc s(x)/c(x) (default scale=20):         -.02358521765210826916

     It looks like fdlibm is perfect and scale=20 in bc is not quite enough.
     sinf(0x1p64) would be 0x1p64 too.  This is preposterous for sin().
 (2) they return extra bits of precision.  The MD double precision versions
     have the same bug; the bug is just clearer for the float case, and
     cannot be fixed by FreeBSD's rounding precision hack.

 BTW, I don't trust the range reduction for floating point pi/2 but
 never finished debugging or fixing it.  According to my old comment,
 it is off by pi/2 and very slow for many values between 32768 and
 65535.  However, I couldn't duplicate this misbehaviour last time I
 looked at it.  I used to fix it using the double version.

 Bruce