FreeBSD iswprint() returns false for U+FF08 in UTF-8 locale

Joseph Kerian jkerian at gmail.com
Wed Jan 31 00:49:45 UTC 2018


I recently searched one of my drives for files containing "unprintable
characters" due to some issues I was seeing with file-listing programs.

When I run ls -B on one of the files, the UTF-8 pattern  0xef 0xbc 0x88 appears
to be the culprit. According to
http://www.utf8-chartable.de/unicode-utf8-table.pl?start=65280&utf8=0x,
this should be U+FF08 a fullwidth left parenthesis. This makes some sense,
given the file.

According to http://demo.icu-project.org/icu-bin/ubrowse?ch=FF08, U+FF08
should be a perfectly printable character in a UTF-8 locale. Looking at the
ls.c source code eventually led me to iswprint().

I wrote the simple program to test print the character enums and then print
iswprint() results in a few locales on a series of characters.
https://wandbox.org/permlink/ZDc36tQhh7BLRpBx

Linux and OSX have some odd behavior around the classes, but U+2002 and
U+FF08 are both perfectly printable on both systems.  On the other hand
FreeBSD is only returning 1 for iswprint(0x64)  Results from my box here
here: https://gist.github.com/anonymous/0f21e139ae10c8c7996e7c056d686a7b,
the results on that wandbox link are pretty typical for Linux systems.

(My box is running: FreeBSD 11.1-RELEASE-p4 GENERIC amd64)

Is this a bug?  Am I missing a pkg/port to properly support UTF-8?

-- 
--
Joe Kerian
Email: jkerian at gmail.com


More information about the freebsd-questions mailing list