printf(1) and UTF-8 multi-byte chars

John Levine johnl at
Sun Oct 18 15:48:46 UTC 2020

In article <slrnroo8n9.1iu4.naddy at> you write:
>On 2020-10-17, Matthias Apitz <guru at> wrote:
>> This means the output of printf(1) is byte oriented and not
>> character oriented.
>This conforms to POSIX.

I don't think there is any useful middle ground between counting bytes
and full Unicode typesetting. Some Unicode characters are half- or
double-width, particularly in east Asian languages, and many combine
with adjacent characters depending on context, e.g., the character ö
can be the single xF6 character which is two UTF-8 bytes, or a
combining diaresis x308 followed by lower case o x6F which is three
UTF-8 bytes, but one space wide either way.

More information about the freebsd-questions mailing list