Replacing libgnuregex

Baptiste Daroussin bapt at FreeBSD.org
Sat Apr 15 16:18:10 UTC 2017


On Sat, Apr 15, 2017 at 01:02:42AM -0500, Kyle Evans wrote:
> On Fri, Apr 14, 2017 at 1:55 PM, Kyle Evans <kevans91 at ksu.edu> wrote:
> 
> > On Tue, Apr 11, 2017 at 3:20 PM, Kyle Evans <kevans91 at ksu.edu> wrote:
> >
> >>
> >> On the other hand, I think I could fairly easily implement most of these
> >> into libc/regex. Here's a summary of what this option entails adding to
> >> libc/regex, from what I've found:
> >>
> >> * Empty subexpressions(*)
> >> * Add missing quantifiers to BREs: \?, \+
> >> * Add branching to BREs: \|
> >> * Add backreferences (\1 through \9) to EREs
> >> * Add \w, \W, \s, and \S corresponding to [[:alnum:]], [^[:alnum:]],
> >> [[:space:]], and [^[:space:]] respectively
> >> * Add word boundaries and anchors:
> >> ** \b: word boundary
> >> ** \B: not word boundary
> >> ** \<: Strt of word
> >> ** \>: End of word
> >> ** \`: Start of subject string
> >> ** \': End of subject string
> >>
> >> (*) I didn't actually find anything explicitly stating this as a GNU
> >> extension, but it's certainly not conformant to POSIX specifications to
> >> use, it gets used a tiny bit in some ports, and we implement a workaround
> >> in bsdgrep(1) for the simplest case of empty expressions ("") to match
> >> everything and produce zero length matches.
> >>
> >> The main benefit of this is not having to maintain a completely separate
> >> regex parser and the potential for inconsistencies that come along with it.
> >> The downside is that that would seem to promote expressions that are not
> >> strictly POSIX conformant. Is this a problem? Is this a problem worth
> >> worrying about?
> >>
> >>
> > FYI- A patch showing what the implementation for all of the above into
> > libc/regex looks like [1]. Some cleanup is still in order and the test set
> > is not exhaustive, but this should implement all of the GNU extensions and
> > it's at least functional.
> >
> > It will break some things (like one of the tests, for instance) that
> > relied on being able to escape an ordinary character (e.g. \b) and get an
> > ordinary character. This is specified as producing undefined behavior [2],
> > though, so I don't feel terrible about breaking it.
> >
> > If this seems desirable, I can work on cleaning it up and splitting it
> > into more consumable bites for FreeBSD's libc.
> >
> > Thanks,
> >
> > Kyle Evans
> >
> > [1] http://files.kyle-evans.net/freebsd/libc-gnuext.diff
> > [2] http://pubs.opengroup.org/onlinepubs/009696899/basedefs/
> > xbd_chap09.html#tag_09_03_03
> >
> 
> An amended version of this patch can be found here:
> https://files.kyle-evans.net/freebsd/libc-gnuext-2.diff
> 
> This one introduces a REG_POSIX flag for regcomp(3) that removes the GNU
> extension for a more POSIX conformant implementation along with an
> amendment to regex.3 to document said flag.
> 
> Instead of removing the tests that don't fail like they should under GNU
> extensions, I've restored them and added a 'P' flag to specify REG_POSIX
> and marked the failing tests as such to clearly denote that they require a
> more strict implementation.
> 
> Thanks,
> 

Thanks for working on this

Just to follow up on this:

Have you tested the results with the AT&T testsuite for regex?

You can find it at least in the dragonfly source tree:
https://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/abce74f49c2c19b069958a0b48de0a9987d14e35

Or online I don't remember where :)

another approach would be to import libtre + extension in our libc (like it was
done on dragonfly - it was actually a freebsd project that stalled)

Best regards,
Bapt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20170415/b1803c82/attachment.sig>


More information about the freebsd-hackers mailing list