Regex Wizards
Wayne Sierke
ws at au.dyndns.ws
Tue Sep 27 06:21:49 UTC 2011
On Mon, 2011-09-26 at 22:02 -0400, grarpamp wrote:
> Under the ERE implementation in RELENG_8, I'm having
> trouble figuring out how to group and backreference this.
>
> Given a line, where:
> If AAA is present, CCC will be too, and B may appear in between.
> If AAA is not present, neither CCC or B will be present.
> DDDD is always present.
> Junk may be present.
> Match good lines and ouput in chunks.
>
> echo junkAAAABCCCDDDDjunk | \
>
> This works as expected:
> sed -E -n 's,^.*(AAAB?CCC)(DDDD).*$,1 \1 2 \2,p'
> 1 AAABCCC 2 DDDD
>
> But making the leading bits optional per spec does not work:
> sed -E -n 's,^.*(AAAB?CCC)?(DDDD).*$,1 \1 2 \2,p'
> 1 2 DDDD
>
> Nor does adding the usual grouping parens:
> sed -E -n 's,^.*((AAAB?CCC)?)(DDDD).*$,1 \1 2 \2,p'
> 1 2
>
> How do I group off the leading bits?
> Or is this a limitation of ERE's?
> Or a bug?
> Thanks.
I believe that the problem is the greediness of the leading '.*'. With
the first grouping optional, its contents are consumed into the '.*'.
This seems to work:
sed -E -n -e '/AAAB?CCC/!s,.*(DDDD).*,1 \1,p' -e 's,.*(AAAB?CCC)(DDDD).*,1 \1 2 \2,p'
%echo junkAABCCCDDDDjunk | sed ...
1 DDDD
%echo junkAAAABCCCDDDDjunk | sed ...
1 AAABCCC 2 DDDD
%echo junkAAAACCCDDDDjunk | sed ...
1 AAACCC 2 DDDD
%echo junkAAAABCCDDDDjunk | sed ...
1 DDDD
Wayne
More information about the freebsd-questions
mailing list