[Bug 225692] iswprint() wrong for some FULL WIDTH characters in UTF-8 locale

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Mon Feb 5 18:34:04 UTC 2018


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=225692

            Bug ID: 225692
           Summary: iswprint() wrong for some FULL WIDTH characters in
                    UTF-8 locale
           Product: Base System
           Version: 11.1-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: bin
          Assignee: freebsd-bugs at FreeBSD.org
          Reporter: jkerian+freebsdbugs at gmail.com

Created attachment 190345
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=190345&action=edit
Simple iswprint test

When I run ls -B on one of my files, the UTF-8 pattern  0xef 0xbc 0x88 appears
to be replaced as unprintable. According to
http://www.utf8-chartable.de/unicode-utf8-table.pl?start=65280&utf8=0x, this
should be U+FF08 a fullwidth left parenthesis.

According to http://demo.icu-project.org/icu-bin/ubrowse?ch=FF08, U+FF08 should
be a perfectly printable character in a UTF-8 locale. Looking at the ls.c
source code eventually led me to iswprint().

I wrote the simple program to test print the character enums and then print
iswprint() results in a few locales on a series of characters. (Attaching in
case of link rot, code & linux results can be seen:
https://wandbox.org/permlink/ZDc36tQhh7BLRpBx)

Linux and OSX have some odd behavior around the classes, but U+2002 and U+FF08
are both perfectly printable on both systems in the UTF-8 locales.  On the
other hand FreeBSD is only returning 1 for iswprint(0x64), while it should be
showing U+2002 and U+FF08 as printable.

On my box, running FreeBSD 11.1-RELEASE-p4 GENERIC amd64, I get the following
results:

[dev ~/test/iswprint]$ ./a.out
alnum:0x400100, cntrl:0x200, ideogram:0x80000, print:0x40000, space:0x4000,
xdigit:0x10000, alpha:0x100, digit:0x400, lower:0x1000, punct:0x2000,
special:0x100000, blank:0x20000, graph:0x800, phonogram:0x200000,
rune:0xffffff00, upper:0x8000,
Default Locale is: C
Character 0x64 is in classes: alnum print xdigit alpha lower graph rune
in C locale, iswprint(0x64) = 1
in en_US.UTF-8 locale, iswprint(0x64) = 1
in ja_JP.UTF-8 locale, iswprint(0x64) = 1

Character 0x2002 is in classes: space rune
in C locale, iswprint(0x2002) = 0
in en_US.UTF-8 locale, iswprint(0x2002) = 0
in ja_JP.UTF-8 locale, iswprint(0x2002) = 0

Character 0xff08 is in classes: rune
in C locale, iswprint(0xff08) = 0
in en_US.UTF-8 locale, iswprint(0xff08) = 0
in ja_JP.UTF-8 locale, iswprint(0xff08) = 0

Character 0x2002 is in classes: space rune
in C locale, iswprint(0x2002) = 0
in en_US.UTF-8 locale, iswprint(0x2002) = 0
in ja_JP.UTF-8 locale, iswprint(0x2002) = 0

Character 0x82 is in classes: cntrl rune
in C locale, iswprint(0x82) = 0
in en_US.UTF-8 locale, iswprint(0x82) = 0
in ja_JP.UTF-8 locale, iswprint(0x82) = 0

I confirmed with a few other FreeBSD users that they get the same results.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list