RESENT with patch ... Re: [RFC] Consistent numeric range for "expr" on all architectures

Bruce Evans brde at
Fri Jul 1 01:09:42 UTC 2011

On Thu, 30 Jun 2011, Stefan Esser wrote:

> Am 30.06.2011 18:50, schrieb David Schultz:
>> On Thu, Jun 30, 2011, Stefan Esser wrote:
>>>  int
>>> +is_integer(const char *s)
>>> +{
>>> +	if (*s == '-')
>>> +		s++;
>>> +	while (isdigit(*s))
>>> +		s++;
>>> +	return *s == '\0';
>>> +}
>> I only glanced at the patch for a few seconds, but your
>> is_integer() routine will accept a bare minus sign ("-") as an
>> integer.

It also invokes undefined behaviour by applying isdigit() to a value
that is not necessarily representable as an unsigned char or equal
to the value EOF, and gives wrong behaviour if there is a digit
whose value as a plain char promotes to an int with the value EOF
(perhaps the latter can't happen).  strto*() is more careful.

It also has a style bug (missing silly parentheses around return value).

Does you current version use only the above to classify integers, so
that leading whitespace is not permitted?  "expr [-e] ' 2' + ' 2'"
now gives 4 since strtoimax() ignores the leading whitespace, but I
think the syntax doesn't allow this.  This is another example of
why expr's syntax is broken as designed.  Shell arithmetic seems to
ignore leading whitespace.  This might be just because the whitespace
is kept when a variable is expanded, but has no significance in
shell expressions.

Trailing whitespace is already disallowed in expr.  This is presumably
just because strto*() stops parsing digits at the whitespace and then
the check for trailing garbage treats even whitespace as garbage.
Shell arithmetic seems to ignore trailing whitespace.

> I mentioned in the message, that I do not test for empty numeric
> operands. I had considered:

This seems to be another bug in old expr.  It gives 4 for
"expr [-e] 2 + '' + 2".

> int
> is_integer(const char *s)
> {
>        if (*s == '-')
>                s++;
>        if (!isdigit(*s++)
>                return 0;
>        while (isdigit(*s))
>                s++;
>        return *s == '\0';
> }
> But this will break script that do
> 	while :
> 	do
> 		i=`expr "$i" + 1`
> 	done
> without first setting i=0. And I assume, such scripts exist in huge numbers.

Eeek,  Doesn't the syntax require them to break?  The resulting expression
is similar to unary plus, which is useful for similar things.

> But "-" will not be accepted as operand (neither integer nor string).
> Didn't I give the example "expr -- - : -" which fails with a syntax
> error?
> Therefore, I do not need to check for actual digits in the input (and
> thus may accept "" as 0 for backwards compatibility), but "-" will not
> be allowed as parameter to expr by the parser ("syntax error").

I tried a Linux expr (from a FreeBSD package in 2002): the following are
all syntax errors ("non-numeric argument"):
     '+1' + 1 (unary plus)
     ' 1' + 1 (leading whitespace)
     '1 ' + 1 (trailing whitespace)
     ''   + 1 (no digits)
The following wrap silently, but not as signed long (this is on i386, but
int64_t is apparently used):
     1111111111111111111 + 1111111111111111111 ->  2222222222222222222
     9223372036854775806 + 1                   ->  9223372036854775807
     9223372036854775806 + 2                   -> -9223372036854775808

We have the compat mode for the broken scripts, but since they would have
broken in Linux in 2002, hopefully there aren't any.  Someone should
check a current Linux expr.  I guess it hasn't changed for unary plus
and whitepsace, since it was perfectly broken to POSIX spec in 2002
and the spec hasn't changed.  It might have changed the behaviour on
overflow (say to make it nonsilent), since the spec doesn't require any
particular brokenness then.


More information about the freebsd-standards mailing list