Re: Confusion with grep & locale?

From: parv/freebsd <parv.0zero9+freebsd_at_gmail.com>
Date: Fri, 20 Aug 2021 09:36:25 UTC
On Thu, Aug 19, 2021 at 11:04 PM Helge Oldach  wrote:
...

> # uname -a
> FreeBSD 13STABLE 13.0-STABLE FreeBSD 13.0-STABLE #49
> stable/13-n246779-64085efb677-dirty: Mon Aug 16 08:42:53 CEST 2021
>  root@XXX  amd64
> # export LANG=en_US.ISO8859-1
> # (echo bla; echo Bla) | grep '[A-Z]'
> bla
> Bla
> # export LANG=C
> # (echo bla; echo Bla) | grep '[A-Z]'
> Bla
> # export LANG=en_US.UTF-8
> # (echo bla; echo Bla) | grep '[A-Z]'
> Bla
> #
>
> For comparison, a Linux RHEL box delivers the expected results:
>
> # uname -a
> Linux rhel.local 3.10.0-1062.9.1.el7.x86_64 #1 SMP Mon Dec 2 08:31:54 EST
> 2019 x86_64 x86_64 x86_64 GNU/Linux
> # export LANG=en_US.ISO8859-1
> # (echo bla; echo Bla) | grep '[A-Z]'
> Bla
> # export LANG=C
> # (echo bla; echo Bla) | grep '[A-Z]'
> Bla
> # export LANG=en_US.UTF-8
> # (echo bla; echo Bla) | grep '[A-Z]'
> Bla
> #
>
> There is nothing special in the environment, specifically no LC_xxx nor
> MM_CHARSET in either case.
>
> Any guidance is appreciated... Thanks!
>

Please file a PR, if one does not already exist, about FreeBSD grep(1)
producing unexpected result under some locale(s).

If desired, as workarounds instead of FreeBSD grep built with base
regex(3) library ...
  - compile base grep with gnugrep library from ports;

  - or, use gnugrep (installed as /usr/local/bin/grep ;-<), ack,
    the_silver_surfer, among others until regex(3) would be fixed (does not
    look like would be by 13.1 release).


- parv