printf(1) and UTF-8 multi-byte chars

John Levine johnl at iecc.com
Sun Oct 18 15:48:46 UTC 2020


In article <slrnroo8n9.1iu4.naddy at lorvorc.mips.inka.de> you write:
>On 2020-10-17, Matthias Apitz <guru at unixarea.de> wrote:
>
>> This means the output of printf(1) is byte oriented and not
>> character oriented.
>
>This conforms to POSIX.

I don't think there is any useful middle ground between counting bytes
and full Unicode typesetting. Some Unicode characters are half- or
double-width, particularly in east Asian languages, and many combine
with adjacent characters depending on context, e.g., the character ö
can be the single xF6 character which is two UTF-8 bytes, or a
combining diaresis x308 followed by lower case o x6F which is three
UTF-8 bytes, but one space wide either way.



More information about the freebsd-questions mailing list