[Bug 243229] awk length() function in base system produces an incorrect results for UTF-8 strings

From: <bugzilla-noreply_at_freebsd.org>
Date: Wed, 19 May 2021 14:59:46 +0000

Frédéric Fauberteau <triaxx_at_NetBSD.org> changed:

           What    |Removed                     |Added
                 CC|                            |triaxx_at_NetBSD.org

--- Comment #2 from Frédéric Fauberteau <triaxx_at_NetBSD.org> ---
I don't know if this issue is related to that bug report, but the following
command prints 'bin':
% echo "bin" | LANG=en_US awk '$1 ~ /^[\t -~]/ {print $0}'
while this one prints nothing:
echo "bin" | LANG=en_US.UTF-8 awk '$1 ~ /^[\t -~]/ {print $0}'

The range from ' ' to '~' includes alphabetical characters when the locale is
not utf-8 but does not when the locale is utf-8.

We can notice that '/^[\t -~]/' matches "bin" with C.UTF-8.

You are receiving this mail because:
You are the assignee for the bug.
Received on Wed May 19 2021 - 14:59:46 UTC

Original text of this message