git: d64438a09dc8 - stable/14 - libregex: fix our mapping for \w

From: Kyle Evans <kevans_at_FreeBSD.org>
Date: Mon, 11 May 2026 15:26:30 UTC
The branch stable/14 has been updated by kevans:

URL: https://cgit.FreeBSD.org/src/commit/?id=d64438a09dc8e466c969fbe94c1a2fa500554da4

commit d64438a09dc8e466c969fbe94c1a2fa500554da4
Author:     Kyle Evans <kevans@FreeBSD.org>
AuthorDate: 2025-08-08 18:21:03 +0000
Commit:     Kyle Evans <kevans@FreeBSD.org>
CommitDate: 2026-05-11 15:20:50 +0000

    libregex: fix our mapping for \w
    
    A small oversight in our implementation of \w is that it's actually
    not strictly [[:alnum:]].  According to the GNU documentation, it's
    actually [[:alnum:]] + underscore.  The fix is rather trivial: just add
    it to our set explicitly, and amend our test set to be sure that _ is
    actually included.
    
    PR:             287396
    (cherry picked from commit d0ff5773cefaf3fa41b1be3e44ca35bd9d5f68ee)
---
 lib/libc/regex/regcomp.c     | 1 +
 lib/libregex/tests/gnuext.in | 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/lib/libc/regex/regcomp.c b/lib/libc/regex/regcomp.c
index eae4d02657e8..d1f0fc0d862f 100644
--- a/lib/libc/regex/regcomp.c
+++ b/lib/libc/regex/regcomp.c
@@ -1170,6 +1170,7 @@ p_b_pseudoclass(struct parse *p, char c) {
 		/* PASSTHROUGH */
 	case 'w':
 		p_b_cclass_named(p, cs, "alnum");
+		CHadd(p, cs, '_');
 		break;
 	case 'S':
 		cs->invert = 1;
diff --git a/lib/libregex/tests/gnuext.in b/lib/libregex/tests/gnuext.in
index 8f49854235a9..3ce0f4af1b34 100644
--- a/lib/libregex/tests/gnuext.in
+++ b/lib/libregex/tests/gnuext.in
@@ -10,9 +10,9 @@ a\|b\|c	b	abc	a
 (ab)\1	-	abab	abab
 \1(ab)	C	ESUBREG
 (a)(b)(c)(d)(e)(f)(g)(h)(i)\9	-	abcdefghii	abcdefghii
-# \w, \W, \s, \S (alnum, ^alnum, space, ^space)
-\w+	-	-%@a0X-	a0X
-\w\+	b	-%@a0X-	a0X
+# \w, \W, \s, \S (_alnum, ^_alnum, space, ^space)
+\w+	-	-%@a_0X-	a_0X
+\w\+	b	-%@a_0X-	a_0X
 \s+	-	aSNTb	SNT
 \s\+	b	aSNTb	SNT
 # Word boundaries (\b, \B, \<, \>, \`, \')