Complex arg-trig functions
Bruce Evans
brde at optusnet.com.au
Fri Sep 21 08:25:13 UTC 2012
On Thu, 20 Sep 2012, Stephen Montgomery-Smith wrote:
> I also added inexact optimizations for casinh and cacos.
I couldn't get this to give the hoped-for optimization and dropped
it even for catanh. You may still prefer it because it is simpler.
But later I found intricacies for returning the correct value of
Pi/2 which make the inexact optimizations even less useful:
The real functions are careful to a fault to return Pi/2 correctly
rounded in all rounding modes. They don't use return a constant Pi/2,
but evaluate Pi/2 at runtime using pio2_hi + pio2_lo, where pio2_hi
is (or should be) Pi/2 rounded _down_ and pio2_lo is an approximation
to the residual and is volatile enough for the addition to be done at
runtime. The following shortcuts lose this care:
- similarly, but with pio2_hi = Pi/2 rounded up. Now pio2_hi + pio2_lo
is 1 ulp too high when the rounding mode is either up or towards
plus infinity. Rounding of Pi/2 to nearest may go either way.
fdlibm code seems to be careful to round it down in all cases. In
in FreeBSD libm, at least e_acosf.c is careful to round down when
the natural rounding is up, but at invtrig.c is not careful -- it
apparently uses natural rounding, which happens to be up for ld80
and down for ld128, or vice versa.
- similary, but with pio2_hi rounded to nearest and 'tiny' used instead
of pio2_lo. Using 'tiny' requires pio2_hi to be nearest and only
works in some rounding modes.
- similarly, but with just m_pi_2 = Pi/2 rounded to nearest. Now there
is no runtime evaluation, so the result cannot depend on the rounding
mode and inexact must be set in some other way.
A quick test of most functions in all rounding mode shows that non-default
modes work quite well except for the most complicated and/or heavily
optimized functions when they are written in C (the totally failing ones
are sin/cos/tan/exp*/pow/hypot but not log* (except for log*(1)) or most
inverse functions. Optimizations in sin/cos/tan/exp2 require rounding
to nearest). My tests weren't non-quick enough to detect any 1-ulp
errors for Pi/2, and only showed that the errors mostly don't blow up for
inverse functions.
In view of this, I'd like to keep doing the Pi/2 intricacies.
Partial diffs for catrig.c:
% 48a49,50
% > #define pio2_hi m_pi_2 /* works because m_pi_2 rounded down */
% > pio2_lo = 6.1232339957367660e-17, /* 0x3C91A626, 0x33145C07 */
% 384,389c390,407
% 335c341
% < * cacos(z) = PI/2 - z + O(|z|^3) as z -> 0
% ---
% > * cacos(z) = PI/2 - z + O(z^3) as z -> 0
This should be PI/2 + O(z) when only the constant term is used, but I
restored use of the z term.
Start changing O(|n|) to O(n). The absolute value should be implicit.
% 384,389c390,407
% < ...
% < if (ax < DBL_EPSILON/8 && ay < SQRT_6_EPSILON/8)
% < return (cpack(m_pi_2, -y));
% ---
% > if (ax < SQRT_EPSILON && ay < SQRT_EPSILON)
% > /*
% > * This is quite subtle. The expression for PI/2 - x
% > * is cloned from e_acos.c, where it is apparently over-
% > * designed to work in all rounding modes. It requires
% > * pio2_hi to be rounded down even when rounding to
% > * nearest would be more accurate. We can't add `tiny'
% > * to pio2_hi as usual to raise inexact, since this would
% > * break the fussy rounding in some non-default modes.
% > * So we use the same method to raise inexact as for the
% > * approximation 'z'. e_acos.c uses the even subtler
% > * method of depending on inexactness in a higher-degree
% > * approximation. That is not practical here, since if
% > * we used the x**3 term then we would need an extra
% > * case to avoid spurious underflow.
% > */
% > if ((int)ax == 0 && (int)ay == 0)
% > return (cpack(pio2_hi - (x - pio2_lo), -y));
Despite being too verbose (BTW, don't commit my essays :-), the comment
neglects to point out that with the expression written in this form,
inexact must be set separately since (x - pio2_lo) might be a value
(e.g., 0) that doesn't give inexactness when subtracted.
All other returns of Pi/2 are simpler than this, and should return
pio2_hi + pio2_lo. The constants should be spelled like this, and
not using M_PI or m_pi_2; this is especially important in long double
precision since then the constants are declared/defined with this
spelling in extern constant tables in invtrig.c to centralize the
complications for defining them them for all combinations of ld80/
ld128/i386. So my patch for this is simplest for long double
precision -- there it uses invtrig.h and doesn't worry about the
known bug that pio2_hi is incorrectly rounded in some cases.
With these intricacies, there is less to be gained by setting inexact
up front. Adding pio2_lo sequentially is slightly slower than an
up-front setting in parallel, but when both are done the up-front
setting just adds overhead on average. Some of the optimizations
could be done more globably:
- an option to not support nonstandard rounding modes for Pi/2. This
seems to require pio2_lo to be a static const, unlike in invtrig.*.
Make this non-volatile. The compiler will then evaluate
pio2_hi + pio2_lo at compile time.
- an option to not support careful setting of inexact. The above
gives it for Pi/2. Settings of it using (1 + tiny) == 1 would
work similarly -- make `tiny' a static nonvolatile const.
Bruce
More information about the freebsd-numerics
mailing list