[RFC] Consistent numeric range for "expr" on all architectures

Thu Jun 30 08:23:34 UTC 2011

Hi Bruce,

thank you for the detailed analysis!

On 30.06.2011 01:32, Bruce Evans wrote:
> On Wed, 29 Jun 2011, Stefan Esser wrote:
>> Am 29.06.2011 01:06, schrieb Bruce Evans:
>>> Other points:
>>> - `expr -e 10000000000000000000 + 0' (19 zeros) gives "Result too large",
>>>   but it isn't the result that is too large, but the arg that is too large.
>>>   This message is strerror(ERANGE) after strtoimax() sets errno to ERANGE.
>>>   `expr -e 1000000000000000000 \* 10' gives "overflow".  This message is
>>>   correct, but it is in a different style to strerror() (uncapitalized,
>>>   and more concise).
>>
>> The patch that I sent with my first message fixes this. The message is
>> changed from "Result too large" to "Not a valid integer: %s".
>>
>> ("non-numeric argument" is used in other places and I could adapt to
>> that, though I prefer to also see which argument was rejected. But I
>> think that "not a valid integer" better describes the situation.)
> 
> I prefer "operand too large".  A decimal integer operand that is too
> large to be represented by intmax_t is not really invalid, but is just
> too large.  We already have a different message for invalid.

Yes, I also thought the "non-numeric" message was misleading and that
"not a valid integer" was just slightly less misleading ;-)

But "operand too large" correctly describes the situation and I'll just
a test for that case (instead of just relying on strtoimax() to decide
about integer-ness).

According to POSIX, "+1" is no valid integer operand in "expr", should a
leading "+" also be detected and rejected?

>> From an old (?) version of the man page:
> 
> %    Arithmetic operations are performed using signed integer math.  If the -e
> %    flag is specified, arithmetic uses the C intmax_t data type (the largest
>                         ^^^^^^^^^^ actual arithmentic, not just parsing; but
>                                    for expr [-e] 10000000000000000000 we are
>                                    doing actual arithmetic -- see below 
> %    integral type available), and expr will detect arithmetic overflow and
> %    return an error indication.  If a numeric operand is specified which is
> 
> "Numeric" is not defined anywhere in the man page.  This is the only
> use of it in the man page.  It means "decimal integer" and should say
> precisely that.  The only hint about this in the man page is the statement
> that "all integer operands are interpreted in base 10".  The fuzziness
> extends to error messages saying "non-numeric argument" instead of
> "operand not a decimal integer [used in integer context]".

Ok, all numeric arguments should be called "decimal integer" in the
documentation and error messages (if applicable).

> %    so large as to overflow conversion to an integer, it is parsed as a
> %    string instead.  If -e is not specified, arithmetic operations and pars-
> 
> This specificially says that large operands are parsed as strings.
> Strangely, since large operands are only checked for with -e, only -e can
> get this right; without -e, large operands are not even detected.
> However, this is a bug in the man page -- see below.

I had already wondered about this case (large numbers being considered
strings), but I thought it was in compliance with POSIX. This will be
fixed by checking operands to be POSIX decimal integers.

> %    ing of integer arguments will overflow silently according to the rules of
> %    the C standard, using the long data type.
> 
> This says that the -e case is broken, but doesn't override the statement
> that large operands are parsed as strings.  Since the man page is wrong,
> no override is needed.

Is the -e case OK if decimal numbers are correctly detected (not just
those accepted by strtoimax(), but all matching "-?[0-9]+")?

> I originally though of using "argument" instead of "operand", but got the
> better word from the above section of the man page.
> 
>> Without "-e" the numeric result is "undefined" and no error is signaled,
>> since there was no test whether the conversion succeeded before I added
>> it back in 2000.
> 
> I first though that the error reporting must be delayed to when an operand
> is used in an expression, even with -e.  But it is already delayed, and
> the parsing works as specified in POSIX.  The parsing is just poorly or
> incorrectly documented in the man page.

Yes, the reporting is delayed, since any operand (even number) might be
used in a RE matching operation (":"). I'll see whether I find a better
wording for the man page but this might be easier for a native speaker
of English.

> - The syntax in the man page doesn't seem to mention the degenerate
>   expression <identifer>.  POSIX specifies this of course.  <identifier>
>   can be either <integer> or <string>, where <integer> is an optional
>   unary minus followed by digits, and <string> is any argument that is
>   not an <integer> and not an operator symbol.

Yes, that definition makes "+1" an invalid <integer> ...

>   Therefore, "expr -e 1000000000000000000" is not a syntax error as seems
>   to be required by the man page; the arg in it forms a degenerate
>   expression.  The arg is not a <string> since it is an <integer>.
>   Therefore, the expression is numeric (I didn't check that POSIX says
>   this explicitly).  Therefore, we are justified in applying strtoimax()
>   to all the operands in the expression (all 1 of them) and getting a
>   range error.

I think this is rational behaviour of the code, but I'm not convinced,
that <identifier> can not be <integer> and <string> at the same time.
See for example:

$ expr \( 222 \) : "2*"
3
$ expr \( 111 + 111 \) : "2*"
3

The numeric result is taken as string to perform the RE matching and the
length of the first match is returned. Therefore, a clear integer can
still be interpreted as string, if operand to a string expression.
The degenerate case of just a too long integer might still be considered
a valid string and the missing operation (or identity?) does not change
that type. Therefore it might be argued, that a very long numeric
argument should just be printed as string in that case. (???)

> - The man page is broken in saying that unrepresentable numeric operands
>   are parsed as strings instead.  Whether an operand is numeric is
>   determined by the POSIX syntax which is purely lexical and doesn't
>   depend on representability.  So for the degnerate expression with 1
>   operand, the type of the expression is determined by the type of the
>   operand which is detemined lexically as described above.  Similarly

I do not fully agree on the type being determined in that way. It is
obvious, that over-long numbers may still be valid strings (and can thus
be used as operands for the RE matching operator ":"). This clearly
shows, that non-numeric arguments can be determined to be of type
string, but numeric arguments are of indetermined  type until used in an
operation (i.e. they are both, valid integer and string, like a mixed
state in quantum mechanics ;-) ).

>   for parenthesized degenerate expressions.  For non-degenerate
>   expressions, the types of the operators and of the result are again
>   mostly or always determined lexically by the types of the results.
>   For example, '=' means equality of integers if both operands are
>   integers, but equality of of strings if one or both operands is not
>   an integer.

Hmm, I think you are right in this case and numeric overflow should
occur (and be detected) if at least one operand is a large decimal
(outside the supported numeric range).

> - In all cases, whether an operand is an integer is context-dependent,
>   so args must not be classified early.  This seems to be done correctly,
>   so the code conforms to the POSIX syntax although the man page doesn't.

I agree, but I'm still not sure about "expr 10000000000000000000000"
causing an integer overflow.

> The syntax is still broken as designed, since it doesn't allow +1 to
> be an integer, and it requires octal intgers to be misinterpreted
> as decimal integers although no reasonable specification of decimal
> integers allows them to start with a '0', and it doesn't support
> non-decimal integers...

We *do* accept "+1" as synonymous to "1", though, as pointed out above.
I'd like to change this.

>>> - POSIX requires brokenness for bases other than 10, but I wonder if an
>>>   arg like 0x10 invokes undefined behaviour and thus can be made to
>>>   work.  (I wanted to use a hex number since I can never remember what
>>>   INTMAX_MAX is in decimal and wanted to type it in hex for checking
>>>   the range and overflow errors.)  Allowing hex args causes fewer
>>>   problems than allowing decimal args larger than INT32_MAX, since
>>>   they are obviously unportable.  Some FreeBSD utilities, e.g., dd,
>>>   support hex args and don't worry about POSIX restricting them.
>>
>> Does POSIX require that expr exits on illegal arguments?
> 
> Not sure.  It requires an exit status of 2 for invalid expressions.
> For the "+" operator, the operands are required to be decimal
> integers, but the error handling isn't so clearly specified.  For
> the "&" operator, operands(s) are allowed to be null.

There is an asymmetry in the handling of logical AND and OR, which may
or may not be mandated by POSIX:

$ expr "" \| ""

$ expr "" \& ""
0

In the case of OR, the second argument is returned, if the first one is
"0" or "". In the example above this results in "" being printed.

In the case of AND, "0" is returned, if both arguments are "0" or "".

This makes a difference if results of logical operations are not used as
logical operands (where "" and "0" have the same value), but as
arguments to other (numeric or string) operations.

Is this mandated by POSIX?
If not: Should we make the behaviour symmetric?

I prefer the way OR operates in that it always returns one of its
operands, the second one if both are FALSE.

For AND we should return the second operand, if the first one evaluated
to TRUE, IMHO (if permitted by POSIX). But since the current behaviour
has existed in FreeBSD for at least 10 years, we might just keep it
(although it slightly complicates the code).

> Anyway, hex numbers can't be put through this gap.  Since they are not
> decimal integers, they are required to be interpreted as strings in
> some contexts.  So "expr 0x10 \< 2" gives 1 because the string "0" is
> less then the string "2".  This conflicts with "expr 16 \< 2" giving
> 0 since both operands are intgers and of course 16 = 0x10 is not less
> than 2.

Too bad, actually ...

>>> - POSIX unfortunately requires args larger than INT32_MAX to be unportable
>>>   (to work if longs are longer than 32 bits, else to give undefined (?)
>>>   behaviour.  For portability there could be a -p switch that limits args
>>>   to INT32_MAX even if longs are longer than 32 bits.
>>
>> Well, undefined behaviour can always be to return the correct result ;-)
>>
>> I'd be willing to add "-p" (effectively just make "-e" the default that
>> can be overridden).
> 
> I now don't see any problem with -e.  Not even the one for degenerate
> expressions that I thought I saw.

That's great!

> POSIX says that shell expressions should be prefered to expr, and for
> shell expressions it has a non-null discussion of representability and
> overflows.  It basically says that only long arithmetic is supported,
> without even C's type suffixes which are needed to extend to unsigned
> long arithmetic, but extensions are encouraged.

Yes, I know that shell expressions are highly preferable to "expr" ...

BTW: /bin/sh does support 64bit numeric range on 32bit architectures in
shell expressions, but without range checks being performed.

I just checked a few shells on i386:

SH:
$ echo $((2000000000000000000000 + 10000000000))
-9223372026854775809

BASH:
$ echo $((2000000000000000000000 + 10000000000))
7751640049368425472

ZSH:
$ echo $((2000000000000000000000 + 10000000000))
zsh: number truncated after 19 digits: 2000000000000000000000 + 10000000000
2000000010000000000

So: How about range checks for $(( )) in our /bin/sh ???

Best regards, STefan