Replacing libgnuregex

Kyle Evans kevans91 at ksu.edu
Fri Apr 14 18:56:13 UTC 2017


On Tue, Apr 11, 2017 at 3:20 PM, Kyle Evans <kevans91 at ksu.edu> wrote:

>
> On the other hand, I think I could fairly easily implement most of these
> into libc/regex. Here's a summary of what this option entails adding to
> libc/regex, from what I've found:
>
> * Empty subexpressions(*)
> * Add missing quantifiers to BREs: \?, \+
> * Add branching to BREs: \|
> * Add backreferences (\1 through \9) to EREs
> * Add \w, \W, \s, and \S corresponding to [[:alnum:]], [^[:alnum:]],
> [[:space:]], and [^[:space:]] respectively
> * Add word boundaries and anchors:
> ** \b: word boundary
> ** \B: not word boundary
> ** \<: Strt of word
> ** \>: End of word
> ** \`: Start of subject string
> ** \': End of subject string
>
> (*) I didn't actually find anything explicitly stating this as a GNU
> extension, but it's certainly not conformant to POSIX specifications to
> use, it gets used a tiny bit in some ports, and we implement a workaround
> in bsdgrep(1) for the simplest case of empty expressions ("") to match
> everything and produce zero length matches.
>
> The main benefit of this is not having to maintain a completely separate
> regex parser and the potential for inconsistencies that come along with it.
> The downside is that that would seem to promote expressions that are not
> strictly POSIX conformant. Is this a problem? Is this a problem worth
> worrying about?
>
>
FYI- A patch showing what the implementation for all of the above into
libc/regex looks like [1]. Some cleanup is still in order and the test set
is not exhaustive, but this should implement all of the GNU extensions and
it's at least functional.

It will break some things (like one of the tests, for instance) that relied
on being able to escape an ordinary character (e.g. \b) and get an ordinary
character. This is specified as producing undefined behavior [2], though,
so I don't feel terrible about breaking it.

If this seems desirable, I can work on cleaning it up and splitting it into
more consumable bites for FreeBSD's libc.

Thanks,

Kyle Evans

[1] http://files.kyle-evans.net/freebsd/libc-gnuext.diff
[2]
http://pubs.opengroup.org/onlinepubs/009696899/basedefs/xbd_chap09.html#tag_09_03_03


More information about the freebsd-hackers mailing list