From nobody Fri Aug 20 14:13:54 2021 X-Original-To: standards@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 65F63175F1CD for ; Fri, 20 Aug 2021 14:13:54 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4GrkC62FDmz3sq8 for ; Fri, 20 Aug 2021 14:13:54 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 358F11A3BC for ; Fri, 20 Aug 2021 14:13:54 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 17KEDsVV042747 for ; Fri, 20 Aug 2021 14:13:54 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 17KEDsEJ042746 for standards@FreeBSD.org; Fri, 20 Aug 2021 14:13:54 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: standards@FreeBSD.org Subject: [Bug 257972] collating sequence not sensible in some locales Date: Fri, 20 Aug 2021 14:13:54 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: standards X-Bugzilla-Version: 13.0-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: freebsd@oldach.net X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: standards@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Standards compliance List-Archive: https://lists.freebsd.org/archives/freebsd-standards List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-standards@freebsd.org X-BeenThere: freebsd-standards@freebsd.org MIME-Version: 1.0 X-ThisMailContainsUnwantedMimeParts: N https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D257972 Bug ID: 257972 Summary: collating sequence not sensible in some locales Product: Base System Version: 13.0-STABLE Hardware: Any OS: Any Status: New Severity: Affects Many People Priority: --- Component: standards Assignee: standards@FreeBSD.org Reporter: freebsd@oldach.net As discussed in https://lists.freebsd.org/archives/freebsd-stable/2021-August/000193.html > > # uname -a > > FreeBSD 13STABLE 13.0-STABLE FreeBSD 13.0-STABLE #49 stable/13-n246779-= 64085efb677-dirty: Mon Aug 16 08:42:53 CEST 2021 root@XXX amd64 > > # export LANG=3Den_US.ISO8859-1 > > # (echo bla; echo Bla) | grep '[A-Z]' > > bla > > Bla >=20 > This one is unexpected, the upper case should be a range of its own > and should not include any lower case letters. > > # export LANG=3Den_US.UTF-8 > > # (echo bla; echo Bla) | grep '[A-Z]' > > Bla >=20 > Here I had expected the result you got with en_US.ISO8859-1 ... > > For comparison, a Linux RHEL box delivers the expected results: > > > > # uname -a > > Linux rhel.local 3.10.0-1062.9.1.el7.x86_64 #1 SMP Mon Dec 2 08:31:54 E= ST 2019 x86_64 x86_64 x86_64 GNU/Linux > > # export LANG=3Den_US.ISO8859-1 > > # (echo bla; echo Bla) | grep '[A-Z]' > > Bla > > # export LANG=3Den_US.UTF-8 > > # (echo bla; echo Bla) | grep '[A-Z]' > > Bla > > Seems that this version uses a POSIX style collating sequence for UTF-8. > Definitely a bug in the definition of the collating sequences. > > And I have just verified that de_DE.ISO8859-1 wrongly considers "=C3=B6" > to be within [a-z], while de_DE.UTF-8 does not (but should). > > Seems that the correct collating sequences for ISO8859-1 and UTF-8 are > each assigned to the other one. Can some knowledgeable person please validate? --=20 You are receiving this mail because: You are the assignee for the bug.=