sort is broken

Per Hedeland per at hedeland.org
Sun Nov 3 12:04:04 UTC 2019


On 2019-11-03 05:19, John R. Levine wrote:
>> In my env, LC_ALL is not set at all.
>>
>> I do have these, but not sure if they make any difference:
>>
>> LANG=en_US.UTF-8
>> XTERM_LOCALE=en_US.UTF-8
>> LESSCHARSET=utf-8
> 
> Try this and see if it's happier:
> 
> export LC_ALL=en_US.UTF-8

According to
https://pubs.opengroup.org/onlinepubs/7908799/xbd/envvar.html (as well
as the sort(1) man page, actually), if no LC_* variables are set, the
LANG setting (if any) is used. And if LC_ALL is set, the setting of
both LANG and all the other LC_* variables is ignored. I.e. your
setting of LC_ALL to the same value as LANG, when no other LC_*
variables are set, should be a no-op.

> I think your problem is that the default C locale is ASCII only.

So not relevant to Ronald's problem, since the C locale isn't used due
to his LANG setting, but the above page says:

    If the locale value is "C" or "POSIX", the POSIX locale is used and
    the standard utilities behave in accordance with the rules in POSIX
    Locale , for the associated category.

where "Posix Locale" is a link to
https://pubs.opengroup.org/onlinepubs/7908799/xbd/locale.html#tag_005_002
which says:

   The tables in Locale Definition describe the characteristics and
   behaviour of the POSIX locale for data consisting entirely of
   characters from the portable character set and the control character
   set. For other characters, the behaviour is unspecified. For
   C-language programs, the POSIX locale is the default locale when the
   setlocale() function is not called.

I.e. it does indeed specify the behavior only for ASCII ("the portable
character set and the control character set"), so in principle 'sort'
could give an error if characters outside that set is present. But as
I showed in an earlier posting, 'sort' has no problem with Ronald's
ISO-8859-1, non-ASCII, character when LANG is set to "C" - presumably
it just uses the full 8-bit byte values, since that is the correct
behavior for ASCII.

--Per


More information about the freebsd-questions mailing list