[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)
Date: Wed, 25 Sep 2024 20:44:12 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281710
--- Comment #9 from commit-hook@FreeBSD.org ---
A commit in branch stable/13 references this bug:
URL:
https://cgit.FreeBSD.org/src/commit/?id=d96ce6d000703f3f57d9214b741e16cc7741d77e
commit d96ce6d000703f3f57d9214b741e16cc7741d77e
Author: Bill Sommerfeld <sommerfeld@hamachi.org>
AuthorDate: 2023-12-21 03:46:14 +0000
Commit: Kyle Evans <kevans@FreeBSD.org>
CommitDate: 2024-09-25 20:42:28 +0000
regex: mixed sets are misidentified as singletons
Fix "singleton" function used by regcomp() to turn character set matches
into exact character matches if a character set has exactly one
element.
The underlying cset representation is complex; most critically it
records"small" characters (codepoint less than either 128
or 256 depending on locale) in a bit vector, and "wide" characters in
a secondary array.
Unfortunately the "singleton" function uses to identify singleton sets
treated a cset as a singleton if either the "small" or the "wide" sets
had exactly one element (it would then ignore the other set).
The easiest way to demonstrate this bug:
$ export LANG=C.UTF-8
$ echo 'a' | grep '[abĂ ]'
It should match (and print "a") but instead it doesn't match because the
single accented character in the set is misinterpreted as a singleton.
PR: 281710
Reviewed by: kevans, yuripv
Obtained from: illumos
(cherry picked from commit 8f7ed58a15556bf567ff876e1999e4fe4d684e1d)
lib/libc/regex/regcomp.c | 25 ++++++++++++++++++-----
lib/libc/tests/regex/multibyte.sh | 43 ++++++++++++++++++++++++++++++++++++++-
2 files changed, 62 insertions(+), 6 deletions(-)
--
You are receiving this mail because:
You are on the CC list for the bug.