[Bug 257972] collating sequence not sensible in some locales

From: <bugzilla-noreply_at_freebsd.org>
Date: Fri, 20 Aug 2021 15:07:48 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257972

Stefan Eßer <se@FreeBSD.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |se@FreeBSD.org

--- Comment #2 from Stefan Eßer <se@FreeBSD.org> ---
While it is true that POSIX does not define it for ISO8859-1 or UTF-8, it
always used to work for ISO8859-1 (as a simple extension of ASCII).

The really surprising result is that ISO5589-1 obviously includes lower case
letters in the range [A-Z] (it never did before!), while UTF-8 excludes them
(and the common practice in Unicode is to have a collating sequence of
"aAbBcC..." for latin based character sets.

There is obviously code that applies some collating sequence rules, but
opposite to what I'd expect.

The Linux example shows that they decided to use the traditional collating
sequence any locale including ISO8859-1 and UTF-8 (and as said, POSIX does not
care at all).

We could make ISO8859-1 use the traditional collating sequence and UTF-8 the
Unicode convention of lower case just before upper case letter, or we could
always apply the traditional collating sequence, but we should definitely use
traditional for UTF-8 and Unicode style for ISO8859-1.

-- 
You are receiving this mail because:
You are the assignee for the bug.