[Bug 243229] awk length() function in base system produces an incorrect results for UTF-8 strings
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 19 May 2021 14:59:46 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=243229
Frédéric Fauberteau <triaxx@NetBSD.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |triaxx@NetBSD.org
--- Comment #2 from Frédéric Fauberteau <triaxx@NetBSD.org> ---
I don't know if this issue is related to that bug report, but the following
command prints 'bin':
% echo "bin" | LANG=en_US awk '$1 ~ /^[\t -~]/ {print $0}'
while this one prints nothing:
echo "bin" | LANG=en_US.UTF-8 awk '$1 ~ /^[\t -~]/ {print $0}'
The range from ' ' to '~' includes alphabetical characters when the locale is
not utf-8 but does not when the locale is utf-8.
We can notice that '/^[\t -~]/' matches "bin" with C.UTF-8.
--
You are receiving this mail because:
You are the assignee for the bug.