printf(1) and UTF-8 multi-byte chars
johnl at iecc.com
Sun Oct 18 15:48:46 UTC 2020
In article <slrnroo8n9.1iu4.naddy at lorvorc.mips.inka.de> you write:
>On 2020-10-17, Matthias Apitz <guru at unixarea.de> wrote:
>> This means the output of printf(1) is byte oriented and not
>> character oriented.
>This conforms to POSIX.
I don't think there is any useful middle ground between counting bytes
and full Unicode typesetting. Some Unicode characters are half- or
double-width, particularly in east Asian languages, and many combine
with adjacent characters depending on context, e.g., the character ö
can be the single xF6 character which is two UTF-8 bytes, or a
combining diaresis x308 followed by lower case o x6F which is three
UTF-8 bytes, but one space wide either way.
More information about the freebsd-questions