[Bug 256473] FreeBSD shells are case insensitive for character ranges

From: <bugzilla-noreply_at_freebsd.org>
Date: Tue, 08 Jun 2021 18:48:23 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256473

--- Comment #7 from Stefan Eßer <se@FreeBSD.org> ---
(In reply to Jason W. Bacon from comment #6)

> I see the pattern now, but your range expansion above is incorrect and doesn't agree with the ls output I provided.
> 
> The lower case letters actually come first, which is not what I expected either.  That's why the output seemed inexplicable at first.
> 
> [A-Z] == [AbB..zZ] == all letters except 'a'
> [a-z] == [aAbB..z] == all letters except 'Z'
> 
> [A-Z]* selects for all but those that start with 'a', not 'z'.  This explains why zip is listed and aardvark is not.

Seems your collating sequence has lower case letters before upper case letters,
but in fact, which is very common (I got that reversed).

But Unicode collation sequences are much more complex than that.

For example, many languages sort by character without regard to upper/lower
case and only if the case-ignorant comparison does not define an ordering, the
case comes into play.

E.g., in /usr/ports:

$ /bin/ls -1d [cC]*
cad
CHANGES
chinese
comms
CONTRIBUTING.md
converters
COPYRIGHT

Case is ignored if the case-ignorant comparison gives a result, and that makes
"cad" come before "CHANGES" and that is followed by "chinese".

This shows, that the order is not primarily determined by the case of the
initial character "c" vs. "C", but by comparing the full name and then using
upper/lower case only as a less relevant criterion.

And that makes "[C]*" behave different from looking at the sorted list and
starting at the first entry that has "C" as its initial letter.

Anyway, this is all specified by the Unicode collation algorithm (UCA), which
describes the algorithm. Each locale definition specifies parameters of that
algorithm and the order you observe complies with that specification (you did
not specify your locale, e.g. the LANG value that is in effect).

There is nothing wrong with the FreeBSD shells, but you may have to set some
environment variable (LC_COLLATE) to the specific value that results in the
correct sort order, if the default does not work for you.

-- 
You are receiving this mail because:
You are the assignee for the bug.