From nobody Tue Jun 08 18:48:23 2021 X-Original-To: bugs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id DCF4211CD256 for ; Tue, 8 Jun 2021 18:48:23 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4FzzlW5qNJz4WfF for ; Tue, 8 Jun 2021 18:48:23 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id A298B5DEA for ; Tue, 8 Jun 2021 18:48:23 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 158ImNGN052929 for ; Tue, 8 Jun 2021 18:48:23 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 158ImNd4052928 for bugs@FreeBSD.org; Tue, 8 Jun 2021 18:48:23 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 256473] FreeBSD shells are case insensitive for character ranges Date: Tue, 08 Jun 2021 18:48:23 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: bin X-Bugzilla-Version: 12.2-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: se@FreeBSD.org X-Bugzilla-Status: Closed X-Bugzilla-Resolution: Works As Intended X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Bug reports List-Archive: https://lists.freebsd.org/archives/freebsd-bugs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-bugs@freebsd.org MIME-Version: 1.0 X-ThisMailContainsUnwantedMimeParts: N https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D256473 --- Comment #7 from Stefan E=C3=9Fer --- (In reply to Jason W. Bacon from comment #6) > I see the pattern now, but your range expansion above is incorrect and do= esn't agree with the ls output I provided. >=20 > The lower case letters actually come first, which is not what I expected = either. That's why the output seemed inexplicable at first. >=20 > [A-Z] =3D=3D [AbB..zZ] =3D=3D all letters except 'a' > [a-z] =3D=3D [aAbB..z] =3D=3D all letters except 'Z' >=20 > [A-Z]* selects for all but those that start with 'a', not 'z'. This expl= ains why zip is listed and aardvark is not. Seems your collating sequence has lower case letters before upper case lett= ers, but in fact, which is very common (I got that reversed). But Unicode collation sequences are much more complex than that. For example, many languages sort by character without regard to upper/lower case and only if the case-ignorant comparison does not define an ordering, = the case comes into play. E.g., in /usr/ports: $ /bin/ls -1d [cC]* cad CHANGES chinese comms CONTRIBUTING.md converters COPYRIGHT Case is ignored if the case-ignorant comparison gives a result, and that ma= kes "cad" come before "CHANGES" and that is followed by "chinese". This shows, that the order is not primarily determined by the case of the initial character "c" vs. "C", but by comparing the full name and then using upper/lower case only as a less relevant criterion. And that makes "[C]*" behave different from looking at the sorted list and starting at the first entry that has "C" as its initial letter. Anyway, this is all specified by the Unicode collation algorithm (UCA), whi= ch describes the algorithm. Each locale definition specifies parameters of that algorithm and the order you observe complies with that specification (you d= id not specify your locale, e.g. the LANG value that is in effect). There is nothing wrong with the FreeBSD shells, but you may have to set some environment variable (LC_COLLATE) to the specific value that results in the correct sort order, if the default does not work for you. --=20 You are receiving this mail because: You are the assignee for the bug.=