[Bug 266001] uniq says it's affected by LC_COLLATE, must not be according to POSIX

From: <bugzilla-noreply_at_freebsd.org>
Date: Tue, 23 Aug 2022 11:51:46 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=266001

            Bug ID: 266001
           Summary: uniq says it's affected by LC_COLLATE, must not be
                    according to POSIX
           Product: Base System
           Version: CURRENT
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Many People
          Priority: ---
         Component: bin
          Assignee: bugs@FreeBSD.org
          Reporter: nabijaczleweli@nabijaczleweli.xyz

The manual says:
-- >8 --
ENVIRONMENT
     The LANG, LC_ALL, LC_COLLATE and LC_CTYPE environment variables affect
     the execution of uniq as described in environ(7).
-- >8 --

This, presumably, means that uniq compares lines with strcoll(3) or equivalent.

Compare Issue 7, uniq, DESCRIPTION
(https://pubs.opengroup.org/onlinepubs/9699919799/utilities/uniq.html):
> The second and succeeding copies of repeated adjacent input lines shall not be written.

And APPLICATION USAGE:
> To remove duplicate lines based on whether they collate equally instead of whether they are identical, applications should use:
>   sort -u

Indeed, Issue 8 (Draft 2.1), following Bug 1070
(https://www.austingroupbugs.net/view.php?id=1070), explicitly clarifies this:
> If the collating sequence of the current locale does not have a total ordering of all characters, the behavior of sort | uniq differs from sort -u, as uniq treats lines as duplicates only if they are identical, whereas sort -u treats lines as duplicates if they collate equally.

-- 
You are receiving this mail because:
You are the assignee for the bug.