git: d0ff5773cefa - main - libregex: fix our mapping for \w

From: Kyle Evans <kevans_at_FreeBSD.org>
Date: Fri, 08 Aug 2025 18:27:55 UTC
The branch main has been updated by kevans:

URL: https://cgit.FreeBSD.org/src/commit/?id=d0ff5773cefaf3fa41b1be3e44ca35bd9d5f68ee

commit d0ff5773cefaf3fa41b1be3e44ca35bd9d5f68ee
Author:     Kyle Evans <kevans@FreeBSD.org>
AuthorDate: 2025-08-08 18:21:03 +0000
Commit:     Kyle Evans <kevans@FreeBSD.org>
CommitDate: 2025-08-08 18:27:26 +0000

    libregex: fix our mapping for \w
    
    A small oversight in our implementation of \w is that it's actually
    not strictly [[:alnum:]].  According to the GNU documentation, it's
    actually [[:alnum:]] + underscore.  The fix is rather trivial: just add
    it to our set explicitly, and amend our test set to be sure that _ is
    actually included.
    
    PR:             287396
---
 lib/libc/regex/regcomp.c     | 1 +
 lib/libregex/tests/gnuext.in | 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/lib/libc/regex/regcomp.c b/lib/libc/regex/regcomp.c
index f34dc322d0bb..aebea2b02435 100644
--- a/lib/libc/regex/regcomp.c
+++ b/lib/libc/regex/regcomp.c
@@ -1183,6 +1183,7 @@ p_b_pseudoclass(struct parse *p, char c) {
 		/* PASSTHROUGH */
 	case 'w':
 		p_b_cclass_named(p, cs, "alnum");
+		CHadd(p, cs, '_');
 		break;
 	case 'S':
 		cs->invert = 1;
diff --git a/lib/libregex/tests/gnuext.in b/lib/libregex/tests/gnuext.in
index 8f49854235a9..3ce0f4af1b34 100644
--- a/lib/libregex/tests/gnuext.in
+++ b/lib/libregex/tests/gnuext.in
@@ -10,9 +10,9 @@ a\|b\|c	b	abc	a
 (ab)\1	-	abab	abab
 \1(ab)	C	ESUBREG
 (a)(b)(c)(d)(e)(f)(g)(h)(i)\9	-	abcdefghii	abcdefghii
-# \w, \W, \s, \S (alnum, ^alnum, space, ^space)
-\w+	-	-%@a0X-	a0X
-\w\+	b	-%@a0X-	a0X
+# \w, \W, \s, \S (_alnum, ^_alnum, space, ^space)
+\w+	-	-%@a_0X-	a_0X
+\w\+	b	-%@a_0X-	a_0X
 \s+	-	aSNTb	SNT
 \s\+	b	aSNTb	SNT
 # Word boundaries (\b, \B, \<, \>, \`, \')