catrig[fl].c and inexact

Sat May 13 16:05:38 UTC 2017

On Fri, 12 May 2017, Steve Kargl wrote:

> On Sat, May 13, 2017 at 11:35:49AM +1000, Bruce Evans wrote:
>> On Fri, 12 May 2017, Steve Kargl wrote:
>>
>>> ...
>>> /usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:56:45: note: expanded from
>>>      macro 'raise_inexact'
>>> #define raise_inexact() do { volatile float junk = 1 + tiny; } while(0)
>>>                                            ^
>>> Grepping catrig.o for the variable 'junk' suggests that 'junk' is
>>> optimized out (with at least -O2).

It is a local variable, so should be and is allocated on the stack, so
you will never find it using grep.  The problem seems to be that all
compilers generated the intended code, but clang warns anyway.

>> Just another bug in clang.  Volatile variables cannot be optimized out
>> (if they are accessed).
>
> Does this depend on scope?  'junk' is local to the do {...} while(0);
> construct.  Can a compiler completely eliminate a do-nothing scoping
> unit?  I don't know C well enough to know.  I do know what I have
> observed in clang.

The semantics of volatile, but as a practical matter standards shouldn't
specify much and compilers should be very conservative.

BTW, I recently noticed that volatile doesn't work right in bus space
macros.  Some reduce to *(volatile int *)var = val, where var is for
memory mapped-i/o that takes 10000 times as long as normal memory to
access.  Compilers still unroll loops setting such variables.  This is
only a pessimization for space.

>>> ...
>>> @@ -315,7 +315,7 @@ casinh(double complex z)
>>> 		return (z);
>>>
>>> 	/* All remaining cases are inexact. */
>>> -	raise_inexact();
>>> +	raise_inexact(new_y);
>>>
>>> 	if (ax < SQRT_6_EPSILON / 4 && ay < SQRT_6_EPSILON / 4)
>>> 		return (z);
>>
>> Now it doesn't take compiler bugs to optimize it out, since new_y is not
>> volatile, and a good compiler would optimize it out in all cases.
>
> I've yet to find a good compiler.  They all seem to have bugs.
>
>> new_y
>> is obviously unused before the early returns, so it doesn't need to be
>> evalated before the returns as far as the compiler can see.  Later,
>> new_y is initialized indirectly, and the compiler can see that too (not
>> so easily, so it can see that raise_inexact() has no effect except possibly
>> for its side effect of raising inexact for 1 + tiny.
>
> The later call passes the address of new_y to the routine.  How
> can the compiler short of inlining the called routine know that
> the value assigned to new_y isn't used?

The compiler does full inlining even when you don't want it.  Full
analysis of the whole source file is fundamental for generating useful
warnings with -Wunused.  Without full analysis, the compiler would
have to assume that new_y is used uninitialized and either suppress
warnings for all variables that might be initialized indirectly
(including via aliased pointers), or generate many bogus warnings
that variables "might be" used uninitialized.  Old compilers mostly
did the latter, and we still see ocasional spurious warnings from
gcc-4.2.1.

Old compilers also have man pages in which this is partly documented.
gcc-3.3.3(1) says that:
- Wuninitialized is null without -O
- Wuninitialized is never generated for volatile variables
- Wuninitialized is not the default since gcc is not smart enough to
   handle it well
gcc-4.2.1(1) says much the same, plus that -Wall implies -Wuninitialized.
It setill says that the compiler is not smart, and doesn't seem to document 
improvements that make this warning reasonable as the default with -Wall.
This is mostly because -O now implies -funit-at-a-time, which I usually
don't want, but which gives the full analysis needed for -Wunitialized
and -Wunused.  I usually don't want this because:
- it slows down compilation
- it allows unwanted inlining
- it allows unportable code.
clang doesn't support -funit-at-a-time.

>> The change might defeat the intent of the original code in another way.
>> 'junk' is intentionally independent of other variables, so that there are
>> no dependencies on it.  If the compiler doesn't optimize away the assignment
>> to new_y, then it is probably because it doesn't see that the assignment is
>> dead, so there is a dependency.
>
> It may defeat the intent of the original code, but it seems that
> the original code provokes undefined behavior.

Defined, but perhaps not what is wanted.  It is using -W flags that gives
undefined behaviour.  They are undefined by the C standard, and also
undefined by compilers with stub man pages.

>> Actually, we want the variable 'junk' to be optimized away.  We only want
>> the side effect of evaluating 1 + tiny.  Compilers have bugs evaluating
>> expressions like 1 + tiny, tiny*tiny and huge*huge, and we use assignments
>> of the result to volatile variables in tens if not hundreds of places to
>> try to work around compiler bugs.  If that doesn't work here, then all the
>> other places are probably broken too.  The other places mostly use a static
>> volatile, while this uses an auto volatile.  'tiny' is also volatile, as
>> required for the standard magic.  I planned to fix all this magic using
>> macros like raise_inexact().
>
> If you plan to fix the magic with raise_inexact, then please
> test with a suite of compilers.  AFAICT, clang is optimizing
> out the code.  I haven't written a testcase to demonstrate this
> as I have other irons in the fire.

I only tested with 4 compilers when I wrote it.  Actually, we agreed
not to worry about compiler bugs for setting fenv, especially for
compilers with even more of them than gcc. libm only has the volatile
hack needed to fix huge*huge for clang in some places (gcc evaluates
huge*huge at run time but tiny*tiny at compile time, so libm has more
volatile hacks for the latter).  Not to mention hacks to remove extra
precision for huge*huge and tiny*tiny.  On i386 with i387, huge*huge
doesn't overflow since it is evaluated in extra precision.   The
wrong result is returned and the wrong result is used if it is assigned
to a variable that can hold the extra precision.  Overflow only occurs
if the variable is converted to float ot double, and STRICT_ASSIGN() or
a volatile hack must be used for this to work around other compiler
bugs (which are actually features, but not allowed by C standards).
C11 and compiler non-support for C11 breaks this further.  C11 adds
the extra pessimization auns subtraction of value of requiring extra
precision (and range) to be destroyed on function return.  clang ignores
this requirement.  Newer gcc supports it under certain pessimal
CFLAGS including -std=c11.

Bruce.