sort is broken

Per Hedeland per at hedeland.org
Sun Nov 3 02:38:31 UTC 2019


On 2019-11-03 02:37, Ronald F. Guilmette wrote:
> In message <20191102233528.CFE66E4728E at ary.local>, you wrote:
> 
>> In article <7668.1572729288 at segfault.tristatelogic.com> you write:
>>> Not a question, just an expression of grief and deep dismay.
>>>
>>> It is a sad day when even very fundamental tools, used in billions
>>> of scripts, such as /usr/bin/sort turn up broken.
>>>
>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=241679
>>
>> I tried it on 11.3 and 12.0 and it works fine.
>>
>> What's in your environment, particularly what's LC_ALL set to?
> 
> In my env, LC_ALL is not set at all.
> 
> I do have these, but not sure if they make any difference:
> 
> LANG=en_US.UTF-8

This, in combination with trying to sort a file with contents that
*isn't* valid UTF-8, is the reason for the behavior you observe - see
my previous post.

The specification of how LANG and the LC_* variables (should) interact
can be found at
https://pubs.opengroup.org/onlinepubs/7908799/xbd/envvar.html - I
believe setting only LANG is the "normal" way to specify a locale.

If you convert your file to UTF-8, e.g. using the strange behavior of
'sort':

$ sort test > test.utf8

- or more "properly" (assuming you have the libiconv package
installed):

$ iconv -f ISO-8859-1 -t UTF-8 test > test.utf8

- you will find that the test.utf8 file is handled correctly by
'sort', both as filename argument and as stdin.

> XTERM_LOCALE=en_US.UTF-8

This - which is actually set by xterm based on how it was started -
implies that your xterm will decode UTF-8 and display the "real"
character.

--Per Hedeland


More information about the freebsd-questions mailing list