gnu/116363: isspace broken for UTF-8 locales

Andrey Chernov ache at nagual.pp.ru
Sun Sep 16 02:06:50 PDT 2007


On Sat, Sep 15, 2007 at 09:08:01AM +0000, Petr Hroudny wrote:
> 
> >Number:         116363
> >Category:       gnu
> >Synopsis:       isspace broken for UTF-8 locales
> >Confidential:   no
> >Severity:       non-critical
> >Priority:       medium
> >Responsible:    freebsd-bugs
> >State:          open
> >Quarter:        
> >Keywords:       
> >Date-Required:
> >Class:          sw-bug
> >Submitter-Id:   current-users
> >Arrival-Date:   Sat Sep 15 09:10:02 GMT 2007
> >Closed-Date:
> >Last-Modified:
> >Originator:     Petr Hroudny
> >Release:        6-stable, 7-current
> >Organization:
> >Environment:
> >Description:
> In UTF-8 locales, isspace(0xA0) returns 1 which is wrong.
> 
> In UTF-8, 0xA0 could only be the second or third byte of multibyte character, but never a space.
> 
> As a consequence, operations like str.upper() and/or str.split() are broken, when
> UTF-8 character with 0xA0 byte is encountered.

It seems that our UTF-8.src is completely wrong, it is just plain Unicode 
and not UTF-8 which multibyte values should start from
C2-DF
E0-EF
F0-F4
only (as stated in http://en.wikipedia.org/wiki/UTF-8 f.e.)
Can anybody write replacement for it?

-- 
http://ache.pp.ru/


More information about the freebsd-i18n mailing list