svn commit: r232275 - in head/sys: amd64/include i386/include pc98/include x86/include

Mon Apr 9 00:26:19 UTC 2012

On Sat, 7 Apr 2012, David Schultz wrote:

> On Fri, Mar 02, 2012, Tijl Coosemans wrote:

Hmm, old news.  I think I already applied, but now notice some more details.

>> Thanks, that was quite informative. C11 does say something about the
>> FP env and signals now though:
>>
>> ``When the processing of the abstract machine is interrupted by receipt
>> of a signal, the values of objects that are neither lock-free atomic
>> objects nor of type volatile sig_atomic_t are unspecified, as is the
>> state of the floating-point environment. The value of any object
>> modified by the handler that is neither a lock-free atomic object nor
>> of type volatile sig_atomic_t becomes indeterminate when the handler
>> exits, as does the state of the floating-point environment if it is
>> modified by the handler and not restored to its original state.''

This apparently allows signal handlers to be called with the FP env
is in an undefined state (as in FreeBSD-4).  But this is a large change
relative to C99, since C99 says nothing about the floating point state
for signal handlers, and its abstract machine requires FP expressions
like "auto double four = 2.0 + 2.0;" to work.  Does "unspecified"
include "undefined", or does the requirement for the abstract machine
to not give undefined behaviour have precedence over the allowance for
the FP env to be anything?

>> This means a signal handler must not rely on the state of the FP env.
>> It may install its own FP env if needed (e.g. FE_DFL_ENV), but then it
>> must restore the original before returning. This allows for the
>> rounding mode to be silently modified for integer conversions for
>> instance.
>>
>> If longjmp is not supposed to change the FP env then, when called from
>> a signal handler, either the signal handler must install a proper FP
>> env before calling longjmp or a proper FP env must be installed after
>> the target setjmp call. Otherwise the FP env is unspecified.
>
> There are two reasonable ways to handle the floating point control
> word.  FreeBSD treats it as a register, resetting it on signal
> handler entry and restoring it on longjmp or signal handler
> return.  Virtually every other OS (e.g., Linux, NetBSD, Solaris)
> treats it as global state, leaving it up to the signal handler to
> preserve it as needed.

I checked what Linux-2.6.10 actually does.  It does nothing as drastic
as passing the interrupted FP environment to signal handlers.  It just
provides a clean FP env for signal handlers, like FreeBSD-5+ signal
handlers do, except more cleanly for FP SIGFPE on x86:
    FreeBSD-[1-4] SIGFPE handling:
       save exception flags in memory
       clear exception flags in i387
       call handler with this unclean state
    FreeBSD-[5-10] SIGFPE handling:
       convert exception flags to a signal code.  Lose details in
         translation.  Forget to merge the SSE flags when doing this.
 	So the signal code cannot be trusted (AFAIR, it also doesn't
 	distinguish between an i387 and an SSE exception.  Better
 	yet, npxtrap() doesn't distinguish, so it blindly translates
 	for i387 when the exception was for SSE).
       clear exception flags in i387.  Do this even if the exception was
 	for SSE.  Forget to do anything with the SSE flags.
       call handler with a different, completely clean state
    Linux-2.6.10 SIGFPE handling:
       (not sure if it has a signal code)
       don't clear exception flags in i387
       call handler with a different, completely clean state
The result is that if signal handler just returns, then:
- under FreeBSD, iff the SIGFPE was for the i387, then the fault doesn't
   repeat
- under Linux and under FreeBSD iff the SIGFPE was for SSE, then the
   fault does repeat
- under FreeBSD, for both cases the i387 exception flags are broken
   (lost), but the SSE exception flags work (are preserved).
Of course, returning from a SIGFPE handler gives undefined behaviour.

This (not just different behaviour) the causes the following problems:
- if the signal handler just returns, nothing good happens for the SIGFPE
   case (except for integer SIGFPE)
- if the signal handler wants to fix up the FP env before returning,
   then it has very large portability problems even for fixing the
   exception flags in the above 3 classes of behaviour.  But a fixup
   is usually essential if the handler is for FP SIGFPE.
- if the signal handler longjmp()s, then it gets the following behaviour:
   - under FreeBSD, it gets the control word restored to that at the time
     of the setjmp() (modulo some bugs in some versions for SSE); similarly
     for the exception flags except the bugs are now features (it's
     best not to touch the exception flags)
   - under Linux-2.6.10, it gets a clean control and status word from the
     signal handler's FP env (unless the signal handler has uncleaned them).

> Both approaches have their merits.  FreeBSD's approach provides
> better semantics.  Library functions, round-to-integer on most
> CPUs, and other things may temporarily change the rounding mode.
> Most programmers don't think about that, but on Linux, if an async
> signal were delivered at the wrong time and did a longjmp, the
> rounding mode would be in an unexpected state.  Most programmers

It's state will be clean, i.e., FE_TONEAREST.  This is OK for fixing
up temporary changes to it, but bad if it was set to another mode 
using fesetround().  The setting may have been either before or
after the setjmp().  I think C99 wants it to be the setting of the
most recent fesetround(), but FreeBSD restores the setting to the
most recent one before the setjmp().

> don't think about that; even a program that never changes the
> rounding mode explicitly could wind up in round-to-zero mode after
> jumping out of a signal handler.

That would only happen in Linux after an explicit fesetround() to
FE_TOWARDZERO in the signal handler.

> The main advantage of the alternative approach is that it avoids
> the overhead of saving and restoring the floating point control
> word.  Many programs don't even use floating point, and the
> efficiency is important for programs that use longjmp frequently,
> e.g., to implement exceptions.
>
> Either way, note the importance of being consistent: If the FP env
> gets clobbered automatically on entry to a signal handler, then
> longjmp must restore what the application had before.  Personally,
> I'm not opposed to changing both signal handlers and longjmp to
> match what the rest of the world does, but this isn't just about
> the mxcsr, as suggested previously.

The rest of the world is already perfectly inconsistent, since it clobbers
the env for signal handlers, and I don't see it changing now that C11
encourages the reverse.

I think the overhead is unimportant.  fnstcw in setjmp() takes 4 cycles
(latency) on AthlonXP.  fldcw in longjmp() takes 11.  Hopefully this is
in parallel so it takes less than 1 cycle each (throughput).  (But I
never got anyway trying to hide the latency of fxsave/fxrstor.)  Some
other arches have hundreds if not thousands of general registers to
save where i386 has only 11, so a few more cycles for FP would be even
more in the noise.

Bruce