catrig[fl].c and inexact

Sat May 13 02:02:32 UTC 2017

On Fri, 12 May 2017, Steve Kargl wrote:

> So, I've been making improvements to my implementations of
> the half-cycle trig functions.  In doing so, I decide to
> add WARNS=2 to msun/Makefile.  clang 4.0.0 dies with an
> error about an unused variable in raise_inexact() from
> catrig[fl].c.
>
> /usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:195:2: error: unused variable
>      'junk' [-Werror,-Wunused-variable]
>        raise_inexact();
>        ^
> /usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:56:45: note: expanded from
>      macro 'raise_inexact'
> #define raise_inexact() do { volatile float junk = 1 + tiny; } while(0)
>                                            ^
> Grepping catrig.o for the variable 'junk' suggests that 'junk' is
> optimized out (with at least -O2).

Just another bug in clang.  Volatile variables cannot be optimized out
(if they are accessed).

> A quick and dirty patch to achieve the intent of the original
> code follows.  It would be nice if some would like to commit
> the patch.  Of course, you may want to wait for Bruce to
> review the diff.
>
> Index: src/catrig.c
> ===================================================================
> --- src/catrig.c	(revision 1935)
> +++ src/catrig.c	(working copy)
> @@ -37,7 +37,7 @@ __FBSDID("$FreeBSD: head/lib/msun/src/catrig.c 313863
> #define isinf(x)	(fabs(x) == INFINITY)
> #undef isnan
> #define isnan(x)	((x) != (x))
> -#define	raise_inexact()	do { volatile float junk = 1 + tiny; } while(0)
> +#define	raise_inexact(x)	do { (x) = 1 + tiny; } while(0)
> #undef signbit
> #define signbit(x)	(__builtin_signbit(x))
>
> @@ -315,7 +315,7 @@ casinh(double complex z)
> 		return (z);
>
> 	/* All remaining cases are inexact. */
> -	raise_inexact();
> +	raise_inexact(new_y);
>
> 	if (ax < SQRT_6_EPSILON / 4 && ay < SQRT_6_EPSILON / 4)
> 		return (z);

Now it doesn't take compiler bugs to optimize it out, since new_y is not
volatile, and a good compiler would optimize it out in all cases.  new_y
is obviously unused before the early returns, so it doesn't need to be
evalated before the returns as far as the compiler can see.  Later,
new_y is initialized indirectly, and the compiler can see that too (not
so easily, so it can see that raise_inexact() has no effect except possibly
for its side effect of raising inexact for 1 + tiny.

The change might defeat the intent of the original code in another way.
'junk' is intentionally independent of other variables, so that there are
no dependencies on it.  If the compiler doesn't optimize away the assignment
to new_y, then it is probably because it doesn't see that the assignment is
dead, so there is a dependency.

Actually, we want the variable 'junk' to be optimized away.  We only want
the side effect of evaluating 1 + tiny.  Compilers have bugs evaluating
expressions like 1 + tiny, tiny*tiny and huge*huge, and we use assignments
of the result to volatile variables in tens if not hundreds of places to
try to work around compiler bugs.  If that doesn't work here, then all the
other places are probably broken too.  The other places mostly use a static
volatile, while this uses an auto volatile.  'tiny' is also volatile, as
required for the standard magic.  I planned to fix all this magic using
macros like raise_inexact().

Another subtety in the macro is that variable is float instead of double
to possibly allow optimizations.  Since the variable shouldn't be
optimized away, it will waste sizeof(var) for each use of the macro.  A
file scope variable would work better here, but the macro is written to
be self-contained to make it easier to use.  The change also defeats that.

Whether not evaluating 1 + tiny at compile time is a compiler bug is
delicate.  We don't have any C99 compilers yet, since gcc and clang
don't support #pragma FENV_ACCESS ON/OFF.  The pragma should be set
to ON before the magic accesses, but we don't do that because it would
be a lot of churn and we know that the pragma doesn't work.  We more
or less depend on the default state of the pragma being ON, but in
gcc-4.2.1 it is documented as being OFF unless compiled with
-frounding-math when it is documented as being ON, and for clang it
is undocumented.  -frounding-math is too inefficient to use by default,
and another bug in clang is that it is not even supported.

With macro or at least inline wrappers, the #pragma should only be
needed in a few places.

clang-3.9.0 seems to be only partly broken here.  Volatile works correctly
for v = huge*huge and also for v = 1+tiny provided v is static instead of
auto.  It also works to declare 'junk' as __unused.

The following don't work with either clang-3.9.0 or gcc-4.2.1:
- declaring 'junk' as __used (syntax error)
- the expression 1+tiny not assigned to anything, or 1+tiny assigned to
   an __unused non-volatile variable.  This gives the weird code of loading
   'tiny' (because the compiler handles read accesses to volatile variables
   correctly), but not adding 1 (because the compiler doesn't know that
   adding 1 has a side effect, or is optimizing for FENV_ACCESS OFF).

The following is documented to not work with gcc-4.2.1:
- #pragma FENV_ACCESS ON.  clang handles this correctly by warning that
   this is unsupported, but this makes it even more unusable.  gcc-4.2.1
   doesn't warn, so it is hard to tell if it worked.

Bruce