[Bug 166861] bsdgrep(1)/sed(1): bsdgrep -E and sed handle invalid {} constructs strangely

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Fri Apr 7 04:05:01 UTC 2017


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166861

Kyle Evans <bsdports at kyle-evans.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bsdports at kyle-evans.net

--- Comment #2 from Kyle Evans <bsdports at kyle-evans.net> ---
(In reply to dubiousjim from comment #1)

Summary of work needed at the bottom, feel free to skip ahead and only look
back for intermediate results/notes.

Some relevant notes:

As of GNU grep 2.27, GNU SED 4.3 on Debian, and BSD grep @ r316566-ish:

(1) and (2) behavior between the two seem to match
(3) 
FreeBSD:
$ echo "a{1,2,3}b" | sed -r "s/{/_/"
a_1,2,3}b
$ echo "a{1,2,3}b" | sed -r "s/}/_/"
a{1,2,3_b

Debian:
$ echo "a{1,2,3}b" | sed -r "s/{/_/"
# Error, invalid preceding expression
# Whoops
$ echo "a{1,2,3}b" | sed -r "s/a{/_/"
# Error, unmatched \{
$ echo "a{1,2,3}b" | sed -r "s/}/_/"
a{1,2,3_b

We do have a test case for this at lib/libc/regex/grot/tests:205 where { is
explicitly meant to be a literal match in both BREs and EREs. We have no case
expression } being a literal match.

FreeBSD:
$ echo "a{1,2,3}b" | sed "s/\}/_/"
# Error, parentheses not balanced

Debian:
$ echo "a{1,2,3}b" | sed "s/\}/_/"
a{1,2,3_b
# Ah, also prefer GNU behavior

This one, it's worth noting, has no test either. It does have the obvious test
for the other side, \{ alone, but no \}.

(4)
FreeBSD:
$ echo "a{1,2,3}b" | sed -r "s/{}/_/"
a{1,2,3}b

Debian:
$ echo "a{1,2,3}b" | sed -r "s/{}/_/"
# Error, invalid preceding expression
# Whoops
$ echo "a{1,2,3}b" | sed -r "s/a{}/_/"
# Error, invalid content
# Reasonable

This one is .... technically correct behavior. Technically, according to
re_format(7), the following "}" is *not* a digit, and therefore this is not a
bounds statement. I think this is really not correct, though. Letting {} take a
literal interpretation leaves us too much room for error getting in if a digit
was expected by the pattern-creator, and I would prefer the GNU approach on
this matter.

We'll probably want to update re_format(7) to be more explicit in this matter,
as well as add a corresponding test case.

(5)
FreeBSD:
$ echo "a{1,2,3}b" | sed -r "s/)/_/"
a{1,2,3}b
$ echo "a{1,2,3}b" | sed "s/\)/_/"
# Error, parentheses not balanced

This is clearly covered in tests:54 (silenced, though) and with slight anger
expressed in the context around it. I lean towards taking the GNU/sane approach
on this one and making this work as one probably expects nowadays.


===== Summary of work needed

(3)
Problem: { in ERE uses literal interpretation
Needed: { throw error
Needed: Fix test case at tests:205 to separate out BRE and ERE cases and adjust
ERE case to meet expectations

Problem: \} in BRE throws an error
Needed: \} match literal


(4)
Problem: {} in ERE uses literal interpretation
Needed: {} throw error
Needed: Consider re_format(7) update to explicitly note {} as illegal
Needed: Test case


(5)
Problem: ) in ERE should throw error
Needed: ) throw error
Needed: Adjust test cases (tests:54)


I think that sums it up -- I'll take a look at these things in the next week or
so.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list