[Bug 243229] awk in base system does not work with UTF-8 strings correctly

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Thu Jan 9 21:20:50 UTC 2020


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=243229

            Bug ID: 243229
           Summary: awk in base system does not work with UTF-8 strings
                    correctly
           Product: Base System
           Version: 12.1-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: misc
          Assignee: bugs at FreeBSD.org
          Reporter: sv at ulstu.ru

I tried using the function length() with UTF-8 strings. And this function
produces an incorrect result. The function works with strings not as
characters, but as bytes. And the number of characters per string is multiplied
by two.

Steps to reproduce (for LANG=ru_RU.UTF-8):

echo 'Привет' | awk '{print length($1)}'

If I use the function length() with lang/gawk, then UTF-8 string length is
calculated correctly.

Are you planning to update awk in the base system to support UTF-8 strings in
the near future?

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list