bug with special bracket expressions in regular expressions
Andriy Gapon
avg at FreeBSD.org
Mon Sep 2 16:46:28 UTC 2013
on 02/09/2013 17:54 Andriy Gapon said the following:
>
> re_format(7) says:
> There are two special cases‡ of bracket expressions: the bracket expres‐
> sions ‘[[:<:]]’ and ‘[[:>:]]’ match the null string at the beginning and
> end of a word respectively. A word is defined as a sequence of word
> characters which is neither preceded nor followed by word characters. A
> word character is an alnum character (as defined by ctype(3)) or an
> underscore. This is an extension, compatible with but not specified by
> IEEE Std 1003.2 (“POSIX.2”), and should be used with caution in software
> intended to be portable to other systems.
>
> However I observe the following:
> $ echo "cd0 cd1 xx" | sed 's/cd[0-9][^ ]* *//g'
> xx
> $ echo "cd0 cd1 xx" | sed 's/[[:<:]]cd[0-9][^ ]* *//g'
> cd1 xx
>
> In my opinion '[[:<:]]' should not affect how the pattern is matched in this case.
It seems that the code works like this:
- first it matches "cd0 " and "removes" it
- then it passes "cd1 xx" for matching with a flag that tells that this is not
a real start of the string
- thus the matching code
o knows that this is not a real line start, so it can't match [[:<:]]
just for that reason
o it does _not_ know what was the character before the start of the given
substring, so it can not know if it could match [[:<:]]
So matching fails.
Not sure if this is an internal problem of regex(3) or a problem of how sed(1)
uses regex(3).
--
Andriy Gapon
More information about the freebsd-current
mailing list