extra-precision bugs in clang on i386 even with __SSE*_MATH__

Wed May 22 08:51:22 UTC 2013

Only replying to some secondary points in the message.

On Wed, May 22, 2013 at 02:55:19PM +1000, Bruce Evans wrote:
> clang with certain march= in CFLAGS uses SSE for double and/or float
> operations even on i386 although the ABI doesn't really allow this,
> and sets __SSE2_MATH__ and/or __SSE_MATH__ to indicate this.  It is
> well known that this breaks the definitions of float_t, double_t and
> FLT_EVAL_METHOD, because FreeBSD headers haven't been updated to support
> clang; in particular they know nothing of __SSE*_MATH__.

Hm, i386 ABI is silent about XMM registers use, which means that the
registers are caller-saved, if available.  And it cannot mandate the
non-use of any processor instructions at all.  ABI-conformant code
could use any supported CPU instruction (and unsupported as well, if
the SIGILL is the intended outcome).

> C11 breaks this area even more.  It specifies that extra precision is
> always clipped on return, at least in C functions.  This breaks
> intentionally returning extra precision for accuracy, and more seriously
> it breaks efficiency by requiring the slow clipping operation on every
> return (on x86, the clipping operation takes about as long as 2
> serially-dependent addition operations and stalls pipelines due to
> its serial dependencies).

SSE conversions like CVTSD2SS are very fast. According to the Agner Fog
tables, on the SandyBridge-class CPU, the instruction has the latency
of 3 and new CVTSD2SS instruction can be started on each cycle. This is
comparable with the simple integer arithmetic.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-numerics/attachments/20130522/cde7ff2c/attachment.sig>