Ctype patch for review
Andrey Chernov
ache at nagual.pp.ru
Wed Sep 19 05:10:30 PDT 2007
On Wed, Sep 19, 2007 at 09:18:30AM +0400, Andrey Chernov wrote:
> I change my mind again, now I use new __mb_bit8_override flag specific to
> UTF-8 encoding (other bit8 overriding encodings could use it too). New
> patch attached.
Improved vesrsion. Intoduce general __mb_sch_limit parameter instead for
all locales specifying upper limit of single char range. It allows also
fix the bug when ctype(3) functions called with arg > 0xFF for wide
character locales and simplifies all checks. New patch is attached. Here
is updated rationale again:
-------------------------------------------------------------------------
The problem is: currently our single byte ctype(3) functions are broken
for wide characters locales in the argument range >= 0x80 - they may
return false positives.
Example 1: for UTF-8 locale we currently have:
iswspace(0xA0)==1 and isspace(0xA0)==1
(because iswspace() and isspace() are the same code)
but must have
iswspace(0xA0)==1 and isspace(0xA0)==0
(because there is no such character and all others in the range
0x80..0xff for the UTF-8 locale, it keeps ASCII only in the single byte
range because our internal wchar_t representation for UTF-8 is UCS-4).
Example 2: for all wide character locales isalpha(arg) when arg > 0xFF may
return false positives (must be 0).
(because iswalpha() and isalpha() are the same code)
Attached patch address this issue and also fix iswascii()
(currently iswascii() is broken for arguments > 0xFF).
This patch is 100% binary compatible with old binaries.
--
http://ache.pp.ru/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ctype.patch
Type: text/x-diff
Size: 14204 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-i18n/attachments/20070919/52dea025/ctype.bin
More information about the freebsd-i18n
mailing list