gnu/116363: isspace broken for UTF-8 locales
Petr Hroudny
petr.hroudny at gmail.com
Sat Sep 15 02:10:02 PDT 2007
>Number: 116363
>Category: gnu
>Synopsis: isspace broken for UTF-8 locales
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Sat Sep 15 09:10:02 GMT 2007
>Closed-Date:
>Last-Modified:
>Originator: Petr Hroudny
>Release: 6-stable, 7-current
>Organization:
>Environment:
>Description:
In UTF-8 locales, isspace(0xA0) returns 1 which is wrong.
In UTF-8, 0xA0 could only be the second or third byte of multibyte character, but never a space.
As a consequence, operations like str.upper() and/or str.split() are broken, when
UTF-8 character with 0xA0 byte is encountered.
An example of such character is Scaron (UTF-8 code 0xC5 0xA0).
>How-To-Repeat:
>Fix:
For UTF-8 locales, 0xA0 should never be considered to be a space.
>Release-Note:
>Audit-Trail:
>Unformatted:
More information about the freebsd-bugs
mailing list