[RFC] Consistent numeric range for "expr" on all architectures

Wed Jun 29 23:32:18 UTC 2011

On Wed, 29 Jun 2011, Stefan Esser wrote:

> Am 29.06.2011 01:06, schrieb Bruce Evans:
>> Other points:
>> - `expr -e 10000000000000000000 + 0' (19 zeros) gives "Result too large",
>>   but it isn't the result that is too large, but the arg that is too large.
>>   This message is strerror(ERANGE) after strtoimax() sets errno to ERANGE.
>>   `expr -e 1000000000000000000 \* 10' gives "overflow".  This message is
>>   correct, but it is in a different style to strerror() (uncapitalized,
>>   and more concise).
>
> The patch that I sent with my first message fixes this. The message is
> changed from "Result too large" to "Not a valid integer: %s".
>
> ("non-numeric argument" is used in other places and I could adapt to
> that, though I prefer to also see which argument was rejected. But I
> think that "not a valid integer" better describes the situation.)

I prefer "operand too large".  A decimal integer operand that is too
large to be represented by intmax_t is not really invalid, but is just
too large.  We already have a different message for invalid.

>From an old (?) version of the man page:

%    Arithmetic operations are performed using signed integer math.  If the -e
%    flag is specified, arithmetic uses the C intmax_t data type (the largest
                         ^^^^^^^^^^ actual arithmentic, not just parsing; but
                                    for expr [-e] 10000000000000000000 we are
                                    doing actual arithmetic -- see below
%    integral type available), and expr will detect arithmetic overflow and
%    return an error indication.  If a numeric operand is specified which is

"Numeric" is not defined anywhere in the man page.  This is the only
use of it in the man page.  It means "decimal integer" and should say
precisely that.  The only hint about this in the man page is the statement
that "all integer operands are interpreted in base 10".  The fuzziness
extends to error messages saying "non-numeric argument" instead of
"operand not a decimal integer [used in integer context]".

%    so large as to overflow conversion to an integer, it is parsed as a
%    string instead.  If -e is not specified, arithmetic operations and pars-

This specificially says that large operands are parsed as strings.
Strangely, since large operands are only checked for with -e, only -e can
get this right; without -e, large operands are not even detected.
However, this is a bug in the man page -- see below.

%    ing of integer arguments will overflow silently according to the rules of
%    the C standard, using the long data type.

This says that the -e case is broken, but doesn't override the statement
that large operands are parsed as strings.  Since the man page is wrong,
no override is needed.

I originally though of using "argument" instead of "operand", but got the
better word from the above section of the man page.

> Without "-e" the numeric result is "undefined" and no error is signaled,
> since there was no test whether the conversion succeeded before I added
> it back in 2000.

I first though that the error reporting must be delayed to when an operand
is used in an expression, even with -e.  But it is already delayed, and
the parsing works as specified in POSIX.  The parsing is just poorly or
incorrectly documented in the man page.

- The syntax in the man page doesn't seem to mention the degenerate
   expression <identifer>.  POSIX specifies this of course.  <identifier>
   can be either <integer> or <string>, where <integer> is an optional
   unary minus followed by digits, and <string> is any argument that is
   not an <integer> and not an operator symbol.

   Therefore, "expr -e 1000000000000000000" is not a syntax error as seems
   to be required by the man page; the arg in it forms a degenerate
   expression.  The arg is not a <string> since it is an <integer>.
   Therefore, the expression is numeric (I didn't check that POSIX says
   this explicitly).  Therefore, we are justified in applying strtoimax()
   to all the operands in the expression (all 1 of them) and getting a
   range error.

- The man page is broken in saying that unrepresentable numeric operands
   are parsed as strings instead.  Whether an operand is numeric is
   determined by the POSIX syntax which is purely lexical and doesn't
   depend on representability.  So for the degnerate expression with 1
   operand, the type of the expression is determined by the type of the
   operand which is detemined lexically as described above.  Similarly
   for parenthesized degenerate expressions.  For non-degenerate
   expressions, the types of the operators and of the result are again
   mostly or always determined lexically by the types of the results.
   For example, '=' means equality of integers if both operands are
   integers, but equality of of strings if one or both operands is not
   an integer.

- In all cases, whether an operand is an integer is context-dependent,
   so args must not be classified early.  This seems to be done correctly,
   so the code conforms to the POSIX syntax although the man page doesn't.

The syntax is still broken as designed, since it doesn't allow +1 to
be an integer, and it requires octal intgers to be misinterpreted
as decimal integers although no reasonable specification of decimal
integers allows them to start with a '0', and it doesn't support
non-decimal integers...

>> - POSIX requires brokenness for bases other than 10, but I wonder if an
>>   arg like 0x10 invokes undefined behaviour and thus can be made to
>>   work.  (I wanted to use a hex number since I can never remember what
>>   INTMAX_MAX is in decimal and wanted to type it in hex for checking
>>   the range and overflow errors.)  Allowing hex args causes fewer
>>   problems than allowing decimal args larger than INT32_MAX, since
>>   they are obviously unportable.  Some FreeBSD utilities, e.g., dd,
>>   support hex args and don't worry about POSIX restricting them.
>
> Does POSIX require that expr exits on illegal arguments?

Not sure.  It requires an exit status of 2 for invalid expressions.
For the "+" operator, the operands are required to be decimal
integers, but the error handling isn't so clearly specified.  For
the "&" operator, operands(s) are allowed to be null.

Anyway, hex numbers can't be put through this gap.  Since they are not
decimal integers, they are required to be interpreted as strings in
some contexts.  So "expr 0x10 \< 2" gives 1 because the string "0" is
less then the string "2".  This conflicts with "expr 16 \< 2" giving
0 since both operands are intgers and of course 16 = 0x10 is not less
than 2.

>> - POSIX unfortunately requires args larger than INT32_MAX to be unportable
>>   (to work if longs are longer than 32 bits, else to give undefined (?)
>>   behaviour.  For portability there could be a -p switch that limits args
>>   to INT32_MAX even if longs are longer than 32 bits.
>
> Well, undefined behaviour can always be to return the correct result ;-)
>
> I'd be willing to add "-p" (effectively just make "-e" the default that
> can be overridden).

I now don't see any problem with -e.  Not even the one for degenerate
expressions that I thought I saw.

POSIX says that shell expressions should be prefered to expr, and for
shell expressions it has a non-null discussion of representability and
overflows.  It basically says that only long arithmetic is supported,
without even C's type suffixes which are needed to extend to unsigned
long arithmetic, but extensions are encouraged.

Bruce