bin/162468: expr(1) false syntax errors

Jilles Tjoelker jilles at stack.nl
Sat Nov 12 00:00:30 UTC 2011


The following reply was made to PR bin/162468; it has been noted by GNATS.

From: Jilles Tjoelker <jilles at stack.nl>
To: Eugene Grosbein <egrosbein at rdtc.ru>
Cc: bug-followup at FreeBSD.org
Subject: Re: bin/162468: expr(1) false syntax errors
Date: Sat, 12 Nov 2011 00:52:59 +0100

 On Sat, Nov 12, 2011 at 01:58:55AM +0700, Eugene Grosbein wrote:
 > 11.11.2011 22:44, Jilles Tjoelker пишет:
 > >> [expr treats any string that looks like an operator as an operator,
 > >> for example, expr '>' : '.*' fails]
 
 > > The current behaviour of expr is allowed by POSIX (SUSv4, XCU 4
 > > Utilities, expr). If the application passes '>', this is not a string
 > > operand but an operator, even if that results in an invalid expression.
 > > This is also documented in the man page.
 
 > Yes. But I have reports that that NetBSD's and Linux's expr(1)
 > both work as expected.
 
 > > It would be a valid extension to allow such expressions but it is not
 > > immediately clear how it would work. For example, should
 > >   expr \( = \)
 > > compare two strings ("0") or return a single string ("=")? And should
 > >   expr \( + \)
 > > return "+" or raise an error?
 
 > It would be wise to take a look at more robust expr(1) implementations
 > and try to keep compatibility.
 
 For '<', your example may work. The expr from GNU coreutils 7.4
 definitely fails your example for '(', ')' and '+'. In the case of '+',
 they added a unary plus operator that takes the next argument as a
 literal even if it looks like an operator so "fixing" it would be ugly.
 GNU expr also has "match", "substr", "index" and "length" operators.
 Trying some more, GNU expr appears inconsistent and unpredictable: it
 will accept strings that have the form of an operator as strings in some
 cases but not all and it is unclear why.
 
 NetBSD's expr supports the "length" operator that we do not, but not
 "match", "substr" or "index". It appears to try fairly hard to make
 wrong input work anyway. For example, it will treat an initial "--" as a
 string (rather than an end-of-options marker) if the next argument is
 not an operator. It also gives yacc the alternative to treat any
 operator except parentheses as a string instead. Because of the
 one-token lookahead of a yacc parser, this does not, however, allow it
 to recognize all possible expressions with such operators as strings.
 For example, if the first two tokens are "length" "<", it may be
 necessary to read all input to decide which of the two is an operator
 (consider the case where the subsequent tokens are zero or more colons).
 
 NetBSD's approach will lead to inconsistent results if we ever need to
 extend expr (such as with GNU's named operators). The extension will
 change the meaning of some expressions in an unpredictable way. One way
 to handle this is to add the GNU cruft; it is unlikely that expr's
 syntax will be extended ever again given that it is mostly a legacy
 tool. The GNU extensions are ugly, though.
 
 If it is accepted that parentheses are always special (which GNU and
 NetBSD expr appear to do, and which is one way to resolve expr \( = \)
 ambiguity) and that there are no named operators or GNU unary "+", then
 there are only binary operators and the first, third, fifth, ...
 arguments excluding parentheses must be operands while the second,
 fourth, sixth, ... must be operators.
 
 > > The test utility is different in that POSIX specifies how a similar
 > > ambiguity shall be resolved (for a limited set of cases).
 
 A similar approach could be applied to expr (e.g. if there are three
 arguments and the second is ":" then it is defined to be a matching
 expression without going into the grammar). The assumption is that
 expressions written without care for strings that look like operators
 will be very simple (one operator only).
 
 > > Oh, and if you want to find a string length in a shell script, why don't
 > > you just use
 > >   ${#VAR}
 > > (given that the string is in $VAR)? If you must use expr(1), do
 > >   expr \( "x$VAR" : '.*' \) - 1
 > > as described in the man page.
 
 > That's just a simple test case. In fact, I need not string length
 > but evaluate regexp that has ()'s:
 
 > read string < file
 > expr -- "$string" : 'Key: \(.*\)'
 
 read string < file
 case $string in
 "Key: "*)
 	printf '%s\n' "${string#Key: }" ;;
 *)
 	echo
 	false ;;
 esac
 
 (Of course, all the printf and false mess is likely unnecessary in a
 real script, but this matches your command very closely.)
 
 A limitation is that the case command and the #/##/%/%% substitutions
 work with shell patterns which are weaker than even basic regular
 expressions.
 
 > Then $string starts with '>' this fails (and $string may start with '>').
 
 It should only fail if $string is exactly '>' or '>='.
 
 > I've found a workaround: expr -- "x$string" : 'xKey: \(.*\)'
 > But that's only workaround, not good solution.
 
 This is not really a workaround, it is the proper way to use expr. So
 poor is the design of expr.
 
 -- 
 Jilles Tjoelker


More information about the freebsd-bugs mailing list