git: d0ff5773cefa - main - libregex: fix our mapping for \w
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 08 Aug 2025 18:27:55 UTC
The branch main has been updated by kevans:
URL: https://cgit.FreeBSD.org/src/commit/?id=d0ff5773cefaf3fa41b1be3e44ca35bd9d5f68ee
commit d0ff5773cefaf3fa41b1be3e44ca35bd9d5f68ee
Author: Kyle Evans <kevans@FreeBSD.org>
AuthorDate: 2025-08-08 18:21:03 +0000
Commit: Kyle Evans <kevans@FreeBSD.org>
CommitDate: 2025-08-08 18:27:26 +0000
libregex: fix our mapping for \w
A small oversight in our implementation of \w is that it's actually
not strictly [[:alnum:]]. According to the GNU documentation, it's
actually [[:alnum:]] + underscore. The fix is rather trivial: just add
it to our set explicitly, and amend our test set to be sure that _ is
actually included.
PR: 287396
---
lib/libc/regex/regcomp.c | 1 +
lib/libregex/tests/gnuext.in | 6 +++---
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/lib/libc/regex/regcomp.c b/lib/libc/regex/regcomp.c
index f34dc322d0bb..aebea2b02435 100644
--- a/lib/libc/regex/regcomp.c
+++ b/lib/libc/regex/regcomp.c
@@ -1183,6 +1183,7 @@ p_b_pseudoclass(struct parse *p, char c) {
/* PASSTHROUGH */
case 'w':
p_b_cclass_named(p, cs, "alnum");
+ CHadd(p, cs, '_');
break;
case 'S':
cs->invert = 1;
diff --git a/lib/libregex/tests/gnuext.in b/lib/libregex/tests/gnuext.in
index 8f49854235a9..3ce0f4af1b34 100644
--- a/lib/libregex/tests/gnuext.in
+++ b/lib/libregex/tests/gnuext.in
@@ -10,9 +10,9 @@ a\|b\|c b abc a
(ab)\1 - abab abab
\1(ab) C ESUBREG
(a)(b)(c)(d)(e)(f)(g)(h)(i)\9 - abcdefghii abcdefghii
-# \w, \W, \s, \S (alnum, ^alnum, space, ^space)
-\w+ - -%@a0X- a0X
-\w\+ b -%@a0X- a0X
+# \w, \W, \s, \S (_alnum, ^_alnum, space, ^space)
+\w+ - -%@a_0X- a_0X
+\w\+ b -%@a_0X- a_0X
\s+ - aSNTb SNT
\s\+ b aSNTb SNT
# Word boundaries (\b, \B, \<, \>, \`, \')