[RFC] Consistent numeric range for "expr" on all architectures

Wed Jun 29 01:10:01 UTC 2011

[I changed developers to standards instead of removing it]

On Tue, 28 Jun 2011, Stefan Esser wrote:

> Am 28.06.2011 13:02, schrieb Poul-Henning Kamp:
>> In message <4E09AF8E.5010509 at freebsd.org>, "Stefan Esser" writes:
>>
>>> Due to (false, according  to BDE) considerations for POSIX compliance,
>>> the 64bit code was made conditional on a command line option in 2002.
>>
>> I think 64bit is the wrong thing to focus on, shouldn't it be
>> "intmax_t" so we will not have to revisit this again ?
>
> Well, actually it already *is* intmax_t, which happens to be 64bit
> on all architectures I checked ;-)
>
> My proposal is just to not produce overflows when easily avoidable.
> This takes little effort, simplifies the code and makes scripts more
> portable accross architectures.
>
> Are there any supported architectures with intmax_t smaller than 64bit?

There cannot be, since C99 requires long long to be at least 64 bits
(counting the sign bit) and it requires intmax_t to be capable of
representing any value of any signed integer type.

Which checking this, I noticed that:
- preprocessor arithmetic is done using intmax_t or uintmax_t.  This causes
   portability problems related to ones for expr -- expressions like
   ULONG_MAX + ULONG_MAX suddenly started in 1999 giving twice ULONG_MAX
   instead of ULONG_MAX-1, but only on arches where ULONG_MAX < UINTMAX_MAX.
   (I use unsigned values in this example to give defined behaviour on
   overflow, so that the expression ULONG_MAX + ULONG_MAX is not just a bug.
   expr doesn't have this complication.)
- C99 doesn't require intmax_t to be the logically longest type.  Thus it
   permits FreeBSD's rather bizarre implementation of intmax_t being plain
   long which is logically shorter than long long.

Other points:
- `expr -e 10000000000000000000 + 0' (19 zeros) gives "Result too large",
   but it isn't the result that is too large, but the arg that is too large.
   This message is strerror(ERANGE) after strtoimax() sets errno to ERANGE.
   `expr -e 1000000000000000000 \* 10' gives "overflow".  This message is
   correct, but it is in a different style to strerror() (uncapitalized,
   and more concise).
- `expr 10000000000000000000' (19 or even 119 zeros) gives no error.  It
   is documented that the arg is parsed as a string in this case, and the
   documentation for -e doesn't clearly say that -e changes this.  And -e
   doesn't change this if the arg clearly isn't a number
   (e.g., if it is 10000000000000000000mumble), or even if it is a non-decimal
   number (e.g., if is 010, 0x10 or 10.0).  If the arg isn't a decimal integer,
   then (except for -e on decimal integers), there is an error irrespective
   of -e when arithmetic is attempted (e.g., adding 0).  The error message
   for this bogusly says "non-numeric argument" when the arg is numeric but
   not a decimal integer.
- POSIX requires brokenness for bases other than 10, but I wonder if an
   arg like 0x10 invokes undefined behaviour and thus can be made to
   work.  (I wanted to use a hex number since I can never remember what
   INTMAX_MAX is in decimal and wanted to type it in hex for checking
   the range and overflow errors.)  Allowing hex args causes fewer
   problems than allowing decimal args larger than INT32_MAX, since
   they are obviously unportable.  Some FreeBSD utilities, e.g., dd,
   support hex args and don't worry about POSIX restricting them.
- POSIX unfortunately requires args larger than INT32_MAX to be unportable
   (to work if longs are longer than 32 bits, else to give undefined (?)
   behaviour.  For portability there could be a -p switch that limits args
   to INT32_MAX even if longs are longer than 32 bits.
- I hope POSIX doesn't require benign overflow.  Thus treating all overflows
   as errors is good for portability and doesn't require any switch.

Bruce