svn commit: r270227 - head/sys/sys

Tue Aug 26 18:13:49 UTC 2014

On Tue, 26 Aug 2014, Benjamin Kaduk wrote:

> On Tue, Aug 26, 2014 at 9:58 AM, Bruce Evans <brde at optusnet.com.au> wrote:
>
>>
>> That would be a further obfuscation.  The *INT<n>C() macros expand to
>> integer constant expressions of the specified type suitable for use
>> in #if preprocessing directives.  (It is otherwise difficult to
>> detemine the correct suffix, to add to the constant to give it the
>> specified type).  There are no preprocessing directives here, so a
>> simple cast works.  The cast could also be applied to the other
>> operand but it is easier to read when applied to the constant.
>
> I thought that in C99, all integers in preprocessor evaluation were treated
> as if
> they were [u]intmax_t (6.10.1.4 in the n1256.pdf I have here).  I was only
> just
> skimming that part yesterday for unrelated reasons, though, so maybe I'm
> missing the bigger picture.

Yes, that makes it unclear what typed constants in cpp expressions are
useful for.  C has always been careful to make limits like UINT_MAX
have the correct type, but that seems to be worse than useless in cpp
expressions.  (Oops.  I was thinking that UCHAR_MAX had type u_char,
but the correctness actually goes the other way -- it is required to
have type the promotion of u_char (signed int except on exotic
machines).  UINT_MAX has type unsigned int.  So -UINT_MAX > 0 in normal
expressions.  But in cpp expressions, UINT_MAX promotes to intmax_t
before it is negated (since C's broken "value-preserving" promotion
rules apply).  So -UINT_MAX < 0 in cpp expressions.

There seem to be some compiler bugs in this.  In cpp expressions,
testing on clang on amd64 gives -UINT_MAX < 0 and -0xFFFFFFFF < 0, but
-0xFFFFFFFFU > 0.  The first 2 results are the same since UINT_MAX
is just 0xFFFFFFFF.
   (<limits.h> intentionally avoids using suffixes on constants if
   possible since it didn't use them old versions, I don't like them,
   and I would have tried to undo any changes that added them.  Some
   buggy versions used them to break thinks like USHRT_MAX -- 0xFFFFU
   has the wrong type.)
But 0xFFFFFFFF and 0xFFFFFFFFU have the same type if u_int == uint32_t,
since the type of an unsuffixed hex constant is the type of lowest rank
that can represent it.

gcc-3.3 on i386 under an old version of FreeBSD gives the same results,
except UINT_MAX is defined with a U suffix so the the result for it
agrees with the wrong result for 0xFFFFFFFF.  Perhaps this is specified
somewhere, but it is bizarre for 0xFFFFFFFFU to promote not bug for bug
compatibly with the "value-preserving" rules, while the same type and
value spelled as 0xFFFFFFFF does promote bug for bug compatibly.

TenDRA (4.2) is normally more of a C compiler than gcc or clang, and it
gives very interesting errors for all 3 cpp expressions:

   [ISO C90 6.8.1]: Can't have target dependent '#if' at outer level.

The expression is target-dependent since the type of 0xFFFFFFFF depends
on the size of int.  Even the result of -1U > 0 is apparently target-
dependent in C90.  I think it is still target-dependent.  On exotic
targets, uintmax_t == u_int, so there is no promotion and -1U > 0,
but on normal targets 1U promotes to a (intmax_t)1 and negating that
makes it negative.  I didn't know about this stupid rule 6.8.1.  It
mainly prevents you writing target-dependent cpp expressions to
determine the target.  It seems to be a bug in TenDRA.  6.8.1 only
says that it is implementation-defined whether the result is the
same as a non-cpp expression, and gives an example of a more usefule
expression involving character constants being quite likely to give
a different rewsult.

Sigh.  My compiler and TurboC handled cpp expressions better in 1988.
cpp was part of the compiler, so casts, sizeof() and floating point
worked in it, and the result of constant expressions didn't depend on
whether they were evaluated in cpp.  Running my compiler now gives
-0xFFFFFFFF > 0 for all spellings of 0xFFFFFFFF, since the compiler
is pre-C90 and never implemented "value-preserving" promotion.  It also
never supported 64-bit integers, so promotion doesn't apply in this
example, but it applies to -0xFFFF in 16-bit mode similarly.

>> The expression could also be written without a cast and without using
>> UINT64_C(), by using a 'ULL' suffix instead of 'LL'.  That would still
>> use the long long abomination, and be different obfuscation -- the
>> type of the constant doesn't really matter, but we need to promote
>> to the type of 'frac', that is, uint64_t.  'ULL' works because long
>> longs are at least 64 bits (and I think unsigned long longs are also
>> 2's complemention, so their type is larger than uint64_t.
>
> Two's complement semantics are guaranteed for the fixed width types
> such as int64_t, but I'm not sure how that comes into play for unsigned
> types?

Unsigned types have 2's complement arithmetic, but might not have
purely 2's complement representations (except for unsigned char).  They
can have padding bits,  Perhaps with trap representations.  I think
that is the only complication allowed in in C99 and later.  In C90, I
think the representation could be anything, but C90 only allows 3 types
of representations for integer types, and it allows inspecting
representations by copying to arrays of unsigned char and looking at
the bits.

Both long long and unsigned long long are 64 bits, but 64 bits is not
quite enough to guarantee representing int64_t since int64_t has a 2's
complement while long long might be 1's complement -- then it can't
represent INTMAX_MIN.  This problem doesn't affect the unsigned case.

Bruce