Fwd: [RFC] Consistent numeric range for "expr" on all architectures

Fri Jul 1 02:24:14 UTC 2011

On Thu, 30 Jun 2011, Garrett Wollman wrote:

> <<On Thu, 30 Jun 2011 19:06:03 +0200, Stefan Esser <se at freebsd.org> said:
>
>> Well, that's the frustrating part: I had implemented 64bit range in expr
>> back in 2000, but this extended range (on 32bit archs) has been made
>> optional, some two years later with the commit message you quote.
>
> The 2001 POSIX standard states:
>
>> Integer variables and constants, including the values of operands
>> and option-arguments, used by the standard utilities listed in this
>> volume of IEEE Std 1003.1-2001 shall be implemented as equivalent to
>> the ISO C standard signed long data type; floating point shall be

Thus, overflow can occur, and the behaviour on overflow is undefined.

>> implemented as equivalent to the ISO C standard double
>> type. Conversions between types shall be as described in the ISO C
>> standard. All variables shall be initialized to zero if they are not
>> otherwise assigned by the input to the application.
>
>> Arithmetic operators and control flow keywords shall be implemented
>> as equivalent to those in the cited ISO C standard section, as
>> listed in Table 1-2 (on page 8).
>
> However, for arithemtic expansion in the shell (but *not* the expr
> utility), it says:
>
>> As an extension, the shell may recognize arithmetic expressions
>> beyond those listed. The shell may use a signed integer type with a
>> rank larger than the rank of signed long. The shell may use a
>> real-floating type instead of signed long as long as it does not
>> affect the results in cases where there is no overflow.

The part about recognizing different syntaxes is an extension and
requires explicit permission.  The part about "using" a rank larger
than that of signed long requires no explicit permission, since it
only affects the result if there is overflow, but then the behaviour
is undefined.  So any utility can do the latter.

The as-if rule always allowed any utility to use any type to implement
arithmetic, provided that the final result is no different if it is
defined.  Thus for example, as an implementation detail real-double
can be used on i386 but not on amd64, since it has 53 value bits on
both, while signed long has fewer (31) value bits on i386 and more
(63) value bits on amd64.  long double could be used on amd64.  When
the FP arithmetic produces an unrepresentable preliminary result like
Inf or NaN due to overflow or worse, conversion of that result to
signed long gives undefined behaviour (typically a garbage value, with
the garbage being amazingly inconsistent depending on the sizes of the
FP and integer types in the conversion).

> The language in the 2008 version of the standard is the same.
>
> For expr, the following definitions are relevant (from XCU7 page
> 2715):
>> integer An argument consisting only of an (optional) unary minus
>>         followed by digits.
>> string  A string argument; see below.
> [...]
>> A string argument is an argument that cannot be identified as an
>> integer argument or as one of the expression operator symbols shown
>> in the OPERANDS section.

Found another bug in FreeBSD expr: it accepts leading whitespace in
integers (since strto*() does), but the above doesn't allow this.
Leading whitespace might even be useful for turning numbers into
strings (it is sort of the reverse of turning strings into numbers
in awk and even in expr by adding 0 to them).  The FreeBSD man page
of course gives null detail about this requirement of the syntax.

Found another bug of commission in the man page:

%      Arithmetic operations are performed using signed integer math.  If the -e
%      flag is specified, arithmetic uses the C intmax_t data type (the largest
%      integral type available), and expr will detect arithmetic overflow and
%      return an error indication.  If a numeric operand is specified which is
%      so large as to overflow conversion to an integer, it is parsed as a
%      string instead.  If -e is not specified, arithmetic operations and pars-
%      ing of integer arguments will overflow silently according to the rules of
%      the C standard, using the long data type.

The rule of the C standard is that overflow gives undefined behaviour, not
silence.

Bruce