gnu/116363: isspace broken for UTF-8 locales
Andrey Chernov
ache at nagual.pp.ru
Sun Sep 16 09:34:10 PDT 2007
On Mon, Sep 17, 2007 at 01:22:14AM +0900, Hye-Shik Chang wrote:
> In fact, UTF-8.src defines values for not UTF-8 but Unicode codepoints.
> Using the Unicode codepoint as wchar_t's internal representation gives
> much benefit. I think we would be better to make isspace() and
> other ctypes functions aware of "encoding". IIRC, tjr@ provided the
> workaround as in the URL mentioned above and said that it would get
> a chance to be fixed in 6 or 7 on 2004.
Currently wchar_t represents given encoding in all places including
wc<->mbr conversions. To make it UCS-4-only instead we need to rewrite the
whole locale system from scratch and I see no benefits from that way.
There is no simple workaround exists.
In any case there is no excuse to make really-UCS-4.src to mimic
UTF-8.src. Providing proper UTF-8.src is much less painful way than whole
locale rewritting and I almost half way on converting UCS-4 source to it.
--
http://ache.pp.ru/
More information about the freebsd-i18n
mailing list