cvs commit: src/lib/msun/src s_cbrtf.c

Bruce Evans bde at FreeBSD.org
Wed Jan 4 23:57:33 PST 2006


bde         2006-01-05 07:57:31 UTC

  FreeBSD src repository

  Modified files:
    lib/msun/src         s_cbrtf.c 
  Log:
  Use double precision internally to optimize cbrtf(), and change the
  algorithm for the second step significantly to also get a perfectly
  rounded result in round-to-nearest mode.  The resulting optimization
  is about 25% on Athlon64's and 30% on Athlon XP's (about 25 cycles
  out of 100 on the former).
  
  Using extra precision, we don't need to do anything special to avoid
  large rounding errors in the third step (Newton's method), so we can
  regroup terms to avoid a division, increase clarity, and increase
  opportunities for parallelism.  Rearrangement for parallelism loses
  the increase in clarity.  We end up with the same number of operations
  but with a division reduced to a multiplication.
  
  Using specifically double precision, there is enough extra precision
  for the third step to give enough precision for perfect rounding to
  float precision provided the previous steps are accurate to 16 bits.
  (They were accurate to 12 bits, which was almost minimal for imperfect
  rounding in the old version but would be more than enough for imperfect
  rounding in this version (9 bits would be enough now).)  I couldn't
  find any significant time optimizations from optimizing the previous
  steps, so I decided to optimize for accuracy instead.  The second step
  needed a division although a previous commit optimized it to use a
  polynomial approximation for its main detail, and this division dominated
  the time for the second step.  Use the same Newton's method for the
  second step as for the third step since this is insignificantly slower
  than the division plus the polynomial (now that Newton's method only
  needs 1 division), significantly more accurate, and simpler.  Single
  precision would be precise enough for the second step, but doesn't
  have enough exponent range to handle denormals without the special
  grouping of terms (as in previous versions) that requires another
  division, so we use double precision for both the second and third
  steps.
  
  Revision  Changes    Path
  1.15      +14 -29    src/lib/msun/src/s_cbrtf.c


More information about the cvs-src mailing list