[PATCH] hypotl, cabsl, and code removal in cabs

Thu Dec 6 22:58:46 PST 2007

On Thu, 6 Dec 2007, Steve Kargl wrote:

> On Thu, Dec 06, 2007 at 04:08:33AM -0500, David Schultz wrote:
>> Also, umm, I've been busy and unable to pay attention for a while,
>> so forgive me if I'm missing something, but isn't it the case that
>> we don't have a sqrtl(), except for the gcc builtin on some
>> architectures?
>
> bde pointed me to the right file in src/libm/ieee that explains
> the rounding issues with hypotl.  I haven't had a chance to
> update my implementation to use extra care in the evaluation of
> a*a+b*b.

I fixed it in your mailbox for the float precision case.  (It is useful
to test algorithms for the float precision case, since only that case
can be tested resonably exhaustively (not actually exhaustively for
2-arg functions like hypotf()).  But after a lot of work, the debugged
version reduces to almost the fdlibm version except for different
style bugs.)

> As to the sqrtl question, I have an implementation that supposely
> does correct rounding in all rounding modes.  It is restricted to
> 64-bit significand long doubles.  The code does not use bit twiddle;
> instead, it uses fenv.

This I haven't looked at closely.  I fear extreme slowness.  On
athlon-xp, fenv accesses take a about 100 cycles each (129 for fldenv
and 89 for fstenv; thus > 200 for fldenv+fstenv in a C-level fenv
access), while bit twiddling instructions can be executed at up to 3
per cycle.  mxcsr accesses are much faster, but mxcsr gives just more
environment to handle for general C-level access functions, since the
i387 and the SSE environments must be maintained in parallel, even on
amd64 in case someone actually uses long doubles (SSE would suffice
without long doubles).

Anyway, the software version of sqrtl is irrelevant on
athlon-xp, since athlon-xp has sqrtl in hardware (takes 35 cycles).
Similarly for amd64, ia64 and possibly sparc64 (sparc64 has sqrt in
hardware so it hopefully has sqrtl in hardware).  arm and powerpc
apparently have long double == double, so the software version of sqrtl
is apparently only needed on ia64.

When gcc and gcc actually support C99+IEC-mumble floating point,
rounding and setting exception flags will have to continue to be
handled using bit fiddling integer instructions or ordinary FP
instructions, possibly moved to the C fenv access functions, since
i387 fenv accesses are too slow to use for anything except
initialization.

Bruce