cvs commit: src/lib/msun/src k_tanf.c
bde at FreeBSD.org
Thu Nov 24 13:48:41 GMT 2005
bde 2005-11-24 13:48:40 UTC
FreeBSD src repository
Minor cleanups and optimizations:
- Remove dead code that I forgot to remove in the previous commit.
- Calculate the sum of the lower terms of the polynomial (divided by
x**5) in a single expression (sum of odd terms) + (sum of even terms)
with parentheses to control grouping. This is clearer and happens to
give better instruction scheduling for a tiny optimization (an
average of about ~0.5 cycles/call on Athlons).
- Calculate the final sum in a single expression with parentheses to
control grouping too. Change the grouping from
first_term + (second_term + sum_of_lower_terms) to
(first_term + second_term) + sum_of_lower_terms. Normally the first
grouping must be used for accuracy, but extra precision makes any
grouping give a correct result so we can group for efficiency. This
is a larger optimization (average 3-4 cycles/call or 5%).
- Use parentheses to indicate that the C order of left to right evaluation
is what is wanted (for efficiency) in a multiplication too.
The old fdlibm code has several optimizations related to these. 2
involve doing an extra operation that can be done almost in parallel
on some superscalar machines but are pessimizations on sequential
machines. Others involve statement ordering or expression grouping.
All of these except the ordering for the combining the sums of the odd
and even terms seem to be ideal for Athlons, but parallelism is still
limited so all of these optimizations combined together with the ones
in this commit save only ~6-8 cycles (~10%).
On an AXP, tanf() on uniformly distributed args in [-2pi, 2pi] now
takes 39-59 cycles. I don't know of any more optimizations for tanf()
short of writing it all in asm with very MD instruction scheduling.
Hardware fsin takes 122-138 cycles. Most of the optimizations for
tanf() don't work very well for tan[l](). fdlibm tan() now takes
Revision Changes Path
1.18 +5 -11 src/lib/msun/src/k_tanf.c
More information about the cvs-src