printf(1) and UTF-8 multi-byte chars
John R. Levine
johnl at iecc.com
Sun Oct 18 18:05:49 UTC 2020
> There are good reasons for using all three levels, here are some:
>
> Bytes: Content length headers, malloc calls - storage related
Sure.
> Glyphs: Truncation, apparent length, sorting - appearance related
Not so much. I suppose it's preferable to truncate at a glyph boundary,
but sorting UTF-8 bytes gives you the same order as sorting the glyphs,
and for useful sorting you need to deal with issues like normalized forms
and case folding. Not sure what use apparent length would be since the
number of glyphs tells you neither the number of visible characters nor
how wide they are.
> Unicode Characters: UTF-8/16/32 conversions - encoding related
That and a lot of composition and display issues.
Regards,
John Levine, johnl at taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
More information about the freebsd-questions
mailing list