Why en_US.UTF-8 locale consider a < A?

Baptiste Daroussin bapt at freebsd.org
Wed Mar 8 08:40:48 UTC 2017


On Wed, Mar 08, 2017 at 12:28:16AM -0800, Xin Li wrote:
> Hi,
> 
> I recently noticed that when LANG and LC_CTYPE are set to en_US.UTF-8,
> the following file:
> 
> %%%%%
> 1
> 2
> A
> a
> B
> b
> %%%%
> 
> I got:
> 
> $ LANG=C LC_CTYPE=C sort testcase
> 1
> 2
> A
> B
> a
> b
> $ LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 sort testcase
> 1
> 2
> a
> A
> b
> B
> 
> Is this result correct?  It matches some Debian behavior but not macOS
> behavior.

Yes the result is correct, macOS does not have unicode collation if you want to
match the macos behaviour you have to set LC_COLLATE=C

Best regards,
Bapt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20170308/eb1bd8f2/attachment.sig>


More information about the freebsd-hackers mailing list