bin/162468: expr(1) false syntax errors
Jilles Tjoelker
jilles at stack.nl
Sat Nov 12 00:00:30 UTC 2011
The following reply was made to PR bin/162468; it has been noted by GNATS.
From: Jilles Tjoelker <jilles at stack.nl>
To: Eugene Grosbein <egrosbein at rdtc.ru>
Cc: bug-followup at FreeBSD.org
Subject: Re: bin/162468: expr(1) false syntax errors
Date: Sat, 12 Nov 2011 00:52:59 +0100
On Sat, Nov 12, 2011 at 01:58:55AM +0700, Eugene Grosbein wrote:
> 11.11.2011 22:44, Jilles Tjoelker пиÑеÑ:
> >> [expr treats any string that looks like an operator as an operator,
> >> for example, expr '>' : '.*' fails]
> > The current behaviour of expr is allowed by POSIX (SUSv4, XCU 4
> > Utilities, expr). If the application passes '>', this is not a string
> > operand but an operator, even if that results in an invalid expression.
> > This is also documented in the man page.
> Yes. But I have reports that that NetBSD's and Linux's expr(1)
> both work as expected.
> > It would be a valid extension to allow such expressions but it is not
> > immediately clear how it would work. For example, should
> > expr \( = \)
> > compare two strings ("0") or return a single string ("=")? And should
> > expr \( + \)
> > return "+" or raise an error?
> It would be wise to take a look at more robust expr(1) implementations
> and try to keep compatibility.
For '<', your example may work. The expr from GNU coreutils 7.4
definitely fails your example for '(', ')' and '+'. In the case of '+',
they added a unary plus operator that takes the next argument as a
literal even if it looks like an operator so "fixing" it would be ugly.
GNU expr also has "match", "substr", "index" and "length" operators.
Trying some more, GNU expr appears inconsistent and unpredictable: it
will accept strings that have the form of an operator as strings in some
cases but not all and it is unclear why.
NetBSD's expr supports the "length" operator that we do not, but not
"match", "substr" or "index". It appears to try fairly hard to make
wrong input work anyway. For example, it will treat an initial "--" as a
string (rather than an end-of-options marker) if the next argument is
not an operator. It also gives yacc the alternative to treat any
operator except parentheses as a string instead. Because of the
one-token lookahead of a yacc parser, this does not, however, allow it
to recognize all possible expressions with such operators as strings.
For example, if the first two tokens are "length" "<", it may be
necessary to read all input to decide which of the two is an operator
(consider the case where the subsequent tokens are zero or more colons).
NetBSD's approach will lead to inconsistent results if we ever need to
extend expr (such as with GNU's named operators). The extension will
change the meaning of some expressions in an unpredictable way. One way
to handle this is to add the GNU cruft; it is unlikely that expr's
syntax will be extended ever again given that it is mostly a legacy
tool. The GNU extensions are ugly, though.
If it is accepted that parentheses are always special (which GNU and
NetBSD expr appear to do, and which is one way to resolve expr \( = \)
ambiguity) and that there are no named operators or GNU unary "+", then
there are only binary operators and the first, third, fifth, ...
arguments excluding parentheses must be operands while the second,
fourth, sixth, ... must be operators.
> > The test utility is different in that POSIX specifies how a similar
> > ambiguity shall be resolved (for a limited set of cases).
A similar approach could be applied to expr (e.g. if there are three
arguments and the second is ":" then it is defined to be a matching
expression without going into the grammar). The assumption is that
expressions written without care for strings that look like operators
will be very simple (one operator only).
> > Oh, and if you want to find a string length in a shell script, why don't
> > you just use
> > ${#VAR}
> > (given that the string is in $VAR)? If you must use expr(1), do
> > expr \( "x$VAR" : '.*' \) - 1
> > as described in the man page.
> That's just a simple test case. In fact, I need not string length
> but evaluate regexp that has ()'s:
> read string < file
> expr -- "$string" : 'Key: \(.*\)'
read string < file
case $string in
"Key: "*)
printf '%s\n' "${string#Key: }" ;;
*)
echo
false ;;
esac
(Of course, all the printf and false mess is likely unnecessary in a
real script, but this matches your command very closely.)
A limitation is that the case command and the #/##/%/%% substitutions
work with shell patterns which are weaker than even basic regular
expressions.
> Then $string starts with '>' this fails (and $string may start with '>').
It should only fail if $string is exactly '>' or '>='.
> I've found a workaround: expr -- "x$string" : 'xKey: \(.*\)'
> But that's only workaround, not good solution.
This is not really a workaround, it is the proper way to use expr. So
poor is the design of expr.
--
Jilles Tjoelker
More information about the freebsd-bugs
mailing list