kern/133583: [libm] fma(3) does not respect rounding mode using extended precision

Fri Dec 3 13:45:53 UTC 2010

On Fri, 3 Dec 2010 das at FreeBSD.org wrote:

> Synopsis: [libm] fma(3) does not respect rounding mode using extended precision
> Thanks for the report! This limitation is described in the source for
> fma(), and unfortunately, it is unlikely to ever change. There are
> several reasons:
>
> - We are a long way from having the necessary compiler support to make
>  dynamic precision changes work as expected.
> - Dynamic FPU precision changes aren't officially supported, and
>  fpsetprec() has been documented as deprecated for many years.

Not really.  See my reply to the commit to the man pages.

> - The only supported architecture that can have this problem due to
>  dynamic precision changes is i386, and even then only for non-SSE2
>  builds.

SSE2 makes little difference to this problem for i386, except for clang
it makes it worse.  The ABI requires using the FPU for at least returning
values, and gcc keeps using the FPU for operations too.  OTOH, clang
uses SSE2 for operations.  This gives an even larger pessimization
than I expected
     (in 1 example, clang with a wrong arch (nocona instead of core2,
     since gcc doesn't support -march=core2 yet and I used the same flags
     for clang as for gcc), clang was 170/45 times slower; with
     -march=core2, it was only 139/45 times slower; with -march=i386,
     it was only 88/45 times slower.  Here -march=i386 works mainly
     by avoiding avoiding even useful SSE1 instructions.  The example
     was a float function, so it only needed SSE1.  Restoring use of
     SSE1 using -march=athlon-xp restores the slowness to 144/45.)
It also makes the precision used more unpredictable than before.  It
now depends on $CC and $CFLAGS, but float.h doesn't.  Fortunately,
i386 float.h covers some cases by defining FLT_EVAL_METHOD = -1, which
says that the FP evaluation method is indeterminate :-).  Unfortunately,
i386 float.h's definition of float_t as double becomes wrong if floats
are actually evaluated in float precision, like clang's use of SSE1
gives.

> - The cost and complexity associated with making every function in
>  libm detect and adapt to dynamic precision changes is prohibitive.

Same as for dynamic rounding direction changes.  Actually, much lower
cost and complexity than for rounding direction.  For rounding direction,
it is actually useful to keep the caller's mode, and supporting this
would require making sure every step of every function works right in
every mode.  For rounding precision, we can just switch to mode that
works for every function that needs it, and most don't need it except
for bizarre environments (like forcing single precision and calling
extended precision functions and expecting them to return any particular
precision).

> I have updated the manpage for fpsetprec() to explain that changing
> the FPU precision isn't supported by the compiler or libraries.

Bruce