Does posix say anything about the sign in NaNs ?

Thu Dec 25 14:01:00 UTC 2014

On Wed, 24 Dec 2014, Pedro Giffuni wrote:

> I got the attached patch from OpenBSD.
>
> It says:
> ____
> Show the sign for NaN as per POSIX; from Elliott Hughes.
> ok martynas@, millert@, doug@
> ____
>
> I can't find a reference in POSIX documentation to support it though.

The behaviour is implementation-defined.  From n869.txt for printf:

X                A  double  argument  representing  an  infinity   is
X                converted in one of the styles [-]inf or [-]infinity
X                 --   which  style  is  implementation-defined.    A
X                double  argument  representing a NaN is converted in
X                one of the styles [-]nan or [-]nan(n-char-sequence)
X                 --  which style, and the  meaning  of  any  n-char-
X                sequence,    is   implementation-defined.    The   F
X                conversion specifier produces INF, INFINITY, or  NAN
X                instead of inf, infinity, or nan, respectively.220)

"style" is not clearly defined.  The format for input in strtod()
is formally defined as [-]something without using the term "style",
but the specication for actually handling the minus sign is even less
complete (see below).

The library intentionally suppresses the sign for NaNs on output.
This is consistent with it ignororing the sign for NaNs on input.

> Anyone has a reason why we shouldn't adopt it, or a reference I can quote
> on the commit log?

It is not needed, and is inconsistent with the treatment for input.
It is not very useful.  Consistent, documented support for the
[-]nan(n-char-sequence) support would be useful.  This support should
be like what gdb does, but better.

Leave it for the gdtoa vendor to change.

For input, then .  The details are unspecified (not even
implementation-defined) and very machine-dependent in practice even for
the negating normal numbers.  From n869.txt for strtod:

X        string.   If  the subject sequence begins with a minus sign, |
X        the  sequence  is  interpreted  as negated.235)  A character

footnote 235:
X           It is unspecified  whether  a  minus-signed  sequence  is
X           converted  to  a  negative number directly or by negating
X           the value resulting  from  converting  the  corresponding
X           unsigned  sequence  (see  F.5); the two methods may yield
X           different results  if  rounding  is  toward  positive  or
X           negative  infinity.   In either case, the functions honor
X           the sign of zero if  floating-point  arithmetic  supports
X           signed zeros.

This is under-specified.  "interpreted as negated" is not defined
anywere, so it must have its English meaning, and that meaning is
unclear in all cases where rounding or NaNs are involved, and also for
the arcane -0.  The footnote doesn't specify anything, since footnotes
are not part of the standard.  It just gives the hint that any
reasonable method may be used, although the methods give different
results.  It gives the hint that -0 should work.  It doesn't say
anything for NaNs.  The two methods may give different results for
NaNs too.  Direct interpretation sets the minus bit, as if by copysign()
from -1.  Whether negating a NaN toggles its sign bit is very
implementation-dependent.

   (The behaviour differs even on i386.  Negation is normally
   implemented using fchs in i387.  SSE doesn't have fchs, so
   negation must be implemented as subtraction from 0, and IIRC
   this doesn't change the sign bit for NaNs.  This gives the
   follow behaviours on i386 for -x on the NaN _variable_ x:
   - gcc for all precisions: all cases use fchs, so flip the sign
   - clang -msse*: clang is incompatible.  It uses SSE* for float
     and double precision, but fchs for long double precision.
     Thus -x flips the sign bit for long double precision only.
   And on amd64:
   - both gcc and clang use SSE for float and double precision, so
     the behaviour is the same as for clang -msse* on i386.

   Toggling the sign bit for +-0 has the same machine-dependenices
   as for NaNs if done naively.  The library cares about this case.
   I think it uses a direct method (more hackish than copysign)
   since anything involving an expression would be even more
   unportable than hacking on the sign bit.)

I'm not sure how the library handles "-nan" on input, but since it
produces a NaN with a sign bit of 0 on i386, it apparently forces
the sign bit to 0, unlike what it does for "-0" and "-0.1")
(replace 0.1 by a number that should be affected by the rounding
mode if necessary to get an interesting example).

In my version of libm, most of the 2-arg functions have hackish changes
related to the above machine dependencies to force consistent results
for pairs of NaNs.  To stop the result of x+y (where x and y are NaNs)
depending on the quiet bit and the precision (due to it being evaluated
in different register sets for different precisions) and sometimes
also on the compiler ordering of x+y, expressions are forced to long
double precision early.  Hardware generally chooses one of x or y
(quieted) for the result of x+y.  If the result depends on the ordering
and the ordering is unpredictable, then sign bits in NaNs get lost
just like value bits in NaNs.

Bruce