svn commit: r363679 - in head: contrib/netbsd-tests/lib/libc/regex/data lib/libc/regex

Kyle Evans kevans at freebsd.org
Fri Jul 31 13:41:41 UTC 2020


On Fri, Jul 31, 2020 at 8:39 AM Li-Wen Hsu <lwhsu at freebsd.org> wrote:
>
> On Fri, Jul 31, 2020 at 9:50 AM Kyle Evans <kevans at freebsd.org> wrote:
> >
> > On Thu, Jul 30, 2020 at 8:47 PM Kyle Evans <kevans at freebsd.org> wrote:
> > >
> > > On Wed, Jul 29, 2020 at 10:53 PM Li-Wen Hsu <lwhsu at freebsd.org> wrote:
> > > >
> > > > On Thu, Jul 30, 2020 at 7:22 AM Kyle Evans <kevans at freebsd.org> wrote:
> > > > >
> > > > > Author: kevans
> > > > > Date: Wed Jul 29 23:21:56 2020
> > > > > New Revision: 363679
> > > > > URL: https://svnweb.freebsd.org/changeset/base/363679
> > > > >
> > > > > Log:
> > > > >   regex(3): Interpret many escaped ordinary characters as EESCAPE
> > > > >
> > > > >   In IEEE 1003.1-2008 [1] and earlier revisions, BRE/ERE grammar allows for
> > > > >   any character to be escaped, but "ORD_CHAR preceded by an unescaped
> > > > >   <backslash> character [gives undefined results]".
> > > > >
> > > > >   Historically, we've interpreted an escaped ordinary character as the
> > > > >   ordinary character itself. This becomes problematic when some extensions
> > > > >   give special meanings to an otherwise ordinary character
> > > > >   (e.g. GNU's \b, \s, \w), meaning we may have two different valid
> > > > >   interpretations of the same sequence.
> > > > >
> > > > >   To make this easier to deal with and given that the standard calls this
> > > > >   undefined, we should throw an error (EESCAPE) if we run into this scenario
> > > > >   to ease transition into a state where some escaped ordinaries are blessed
> > > > >   with a special meaning -- it will either error out or have extended
> > > > >   behavior, rather than have two entirely different versions of undefined
> > > > >   behavior that leave the consumer of regex(3) guessing as to what behavior
> > > > >   will be used or leaving them with false impressions.
> > > > >
> > > > >   This change bumps the symbol version of regcomp to FBSD_1.6 and provides the
> > > > >   old escape semantics for legacy applications, just in case one has an older
> > > > >   application that would immediately turn into a pumpkin because of an
> > > > >   extraneous escape that's embedded or otherwise critical to its operation.
> > > > >
> > > > >   This is the final piece needed before enhancing libregex with GNU extensions
> > > > >   and flipping the switch on bsdgrep.
> > > > >
> > > > >   [1] http://pubs.opengroup.org/onlinepubs/9699919799.2016edition/
> > > > >
> > > > >   PR:           229925 (exp-run, courtesy of antoine)
> > > > >   Differential Revision:        https://reviews.freebsd.org/D10510
> > > > >
> > > > > Modified:
> > > > >   head/contrib/netbsd-tests/lib/libc/regex/data/meta.in
> > > > >   head/contrib/netbsd-tests/lib/libc/regex/data/subexp.in
> > > > >   head/lib/libc/regex/Symbol.map
> > > > >   head/lib/libc/regex/regcomp.c
> > > >
> > > > I think there are 3 test cases need to be modified after this change:
> > > >
> > > > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/16011/testReport/junit/lib.googletest.gtest_main/googletest-port-test/main/
> > > > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/16011/testReport/junit/usr.bin.diff/diff_test/side_by_side/
> > > > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/16011/testReport/junit/usr.bin.sed/sed2_test/hex_subst/
> > > >
> > >
> > > CC'ing asomers@ and ngie@, because ISTR they have some googletest stock.
> > >
> > > Testing my libregex GNU extensions revealed that I'm really not ready
> > > to commit that just yet. We have two options here for googletest:
> > >
> > > 1. Disable it and create a PR to be fixed when my changes are done,
> > > hopefully by the end of the week, or
> > > 2. Fix the expressions in
> > > contrib/googletest/googletest/test/googletest-port-test.cc to be POSIX
> > > compliant and upstream that.
> > >
> > > #2 is generally a replacement of \w -> [[:alnum:]] and \W ->
> > > [^[:alnum:]] and maybe \s -> [[:space:]].
> > >
> >
> > Sorry, to be more precise: disable it meaning expect failure of that
> > specific test or something similar.
>
> I think there's no need to let a known issue generate lots of failure
> reports for more than 24 hours, I suggest let's go with 1) first. For
> 2), It's also good that both libregex and googletest can aware the
> difference between POSIX and GNU extensions, but I am not sure how
> upstream thinks about this. Still worth trying, though.
>

Sure- if you have time and no one objects, please proceed with #1 (no
time at the moment myself) and I'll get it fixed this weekend, even if
I have to hold back implementation of some of the GNU extensions to
nab the few googletest's tests care about.

Thanks,

Kyle Evans


More information about the svn-src-head mailing list