[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 25 Sep 2024 13:30:34 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281710
Bug ID: 281710
Summary: RegEXP bug in bracket expression [^...] - sed(1),
grep(1), re_format(7)
Product: Base System
Version: 14.1-RELEASE
Hardware: amd64
OS: Any
Status: New
Severity: Affects Some People
Priority: ---
Component: standards
Assignee: standards@FreeBSD.org
Reporter: erichanskrs@gmail.com
It looks like there's a bug in FreeBSD's sed(1), grep(1), re_format(7),
regarding accented characters and their use in a bracket expression [^...] in
regular expressions (modern REs as well as basic REs).
-- Short examples
Command lines 202, 203 and 207 show unexpected bahaviour.
[200] # echo '9a' | /usr/bin/sed -En 's/([^a])(a)/-\1-\2-/p'
-9-a-
[201] # echo '9a' | /usr/bin/sed -n 's/\([^a]\)\(a\)/-\1-\2-/p'
-9-a-
[202] # echo '9â' | /usr/bin/sed -n 's/\([^â]\)\(â\)/-\1-\2-/p' # <--
[203] # echo '9â' | /usr/bin/sed -En 's/([^â])(â)/-\1-\2-/p' # <--
[204] # echo '9â' | /usr/local/bin/gsed -En 's/([^â])(â)/-\1-\2-/p'
-9-â-
[205] # echo 'ââ' | /usr/bin/sed -En 's/([â])(â)/-\1-\2-/p'
-â-â-
[206] # echo 'ââ' | /usr/local/bin/gsed -En 's/([â])(â)/-\1-\2-/p'
-â-â-
[207] # echo '9â' | /usr/bin/grep -E '[^â]â' # <--
[208] #
Same results with characters like 'ç' and 'é'.
Reported in forum thread (see link below) Unicode characters.
-- Reference
FreeBSD forum link:
https://forums.freebsd.org/threads/bug-in-regexp-sed-1-grep-1-and-re_format-7.95088/
re_format(7):
"
DESCRIPTION
[...]
A bracket expression is a list of characters enclosed in `[]'. It nor-
mally matches any single character from the list (but see below). If
the list begins with `^', it matches any single character (but see be-
low) not from the rest of the list.
"
As FreeBSD intends/tries to conform to POSIX, likewise :
https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap09.html#tag_09_03_05
"
3. A non-matching list expression begins with a <circumflex> ('^'), and the
matching behavior shall be the logical inverse of the corresponding matching
list expression (the same bracket expression but without the leading
<circumflex>). For example, since the RE "[abc]" only matches 'a', 'b', or 'c',
it follows that "[^abc]" is an RE that matches any character except 'a', 'b',
or 'c'. It is unspecified whether a non-matching list expression matches a
multi-character collating element that is not matched by any of the
expressions. The <circumflex> shall have this special meaning only when it
occurs first in the list, immediately following the <left-square-bracket>.
"
-- Context of my OS and programs:
[100] # uname -a
FreeBSD q210 14.1-RELEASE-p5 FreeBSD 14.1-RELEASE-p5 GENERIC amd64
[101] # pkg which /usr/local/bin/ggrep
/usr/local/bin/ggrep was installed by package gnugrep-3.11
[102] # pkg which /usr/local/bin/gsed
/usr/local/bin/gsed was installed by package gsed-4.9
[103] # locale
LANG=C.UTF-8
LC_CTYPE="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_TIME="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_ALL=
--
You are receiving this mail because:
You are the assignee for the bug.