[Bug 243229] awk length() function in base system produces an incorrect results for UTF-8 strings

From: <bugzilla-noreply_at_freebsd.org>
Date: Wed, 19 May 2021 14:59:46 UTC

Frédéric Fauberteau <triaxx@NetBSD.org> changed:

           What    |Removed                     |Added
                 CC|                            |triaxx@NetBSD.org

--- Comment #2 from Frédéric Fauberteau <triaxx@NetBSD.org> ---
I don't know if this issue is related to that bug report, but the following
command prints 'bin':
% echo "bin" | LANG=en_US awk '$1 ~ /^[\t -~]/ {print $0}'
while this one prints nothing:
echo "bin" | LANG=en_US.UTF-8 awk '$1 ~ /^[\t -~]/ {print $0}'

The range from ' ' to '~' includes alphabetical characters when the locale is
not utf-8 but does not when the locale is utf-8.

We can notice that '/^[\t -~]/' matches "bin" with C.UTF-8.

You are receiving this mail because:
You are the assignee for the bug.