Re: Grep with non-ascii

From: Eivind Nicolay Evensen <eivinde_at_terraplane.org>
Date: Fri, 03 Feb 2023 14:26:17 UTC
Den Fri, 3 Feb 2023 20:39:48 +0900
skrev Tomoaki AOKI <junchoon@dec.sakura.ne.jp>:

> On Fri, 3 Feb 2023 11:06:42 +0100
> Eivind Nicolay Evensen <eivinde@terraplane.org> wrote:
> 
> > Hello.
> > 
> > I just noticed this today:
> >   
> > elg!ene[~]> printf "bø\nhei\nøl\n" | grep ø  
> > grep: trailing backslash (\)  
> > elg!ene[~]> echo $LC_CTYPE $LANG  
> > nb_NO.ISO8859-1 nb_NO.ISO8859-1
> > 
> > While I have the result I envisioned with gnugrep:
> >   
> > elg!ene[~]> printf "bø\nhei\nøl\n" | ggrep ø  
> > bø
> > øl
> > 
> > Also, on OpenIndiana, linux and Netbsd, grep gives the proper
> > result.
> > 
> > Is lib/libc/regex the right place to look into this if I
> > find the time, or does anybody know this enough to know the
> > problem?
> > 
> > Regards
> > -- 
> > Eivind Nicolay Evensen  
> 
> Possibly a locale problem, or depending on what command line shell you
> are using.
> 
> Tried copy/pasting to command line, I got the result below.
> 
> % printf "bø\nhei\nøl\n" | grep ø
> bø
> øl
> 
> I'm using LC_ALL=ja_JP.UTF-8, LANG=ja_JP.UTF-8 as locale and
> shells/zsh as command line shell.
> 
> What happenes if you switch locale to nb_NO.UTF-8?
> 

Indeed seems like a locale problem, because it works when
I change it:

elg!ene[~]> grep ø
grep: trailing backslash (\)
(i select UTF-8 encoding in the xterm menu here)
elg!ene[~]> setenv LC_CTYPE nb_NO.UTF-8
elg!ene[~]> grep ø
zzz
æøå
æøå
^D

Perhaps for more of them, I just tried this (back to non-utf8 encoding in xterm):

elg!ene[~]> setenv LC_CTYPE sv_SE.ISO8859-1
elg!ene[~]> grep 
grep: trailing backslash (\)

and

elg!ene[~]> setenv LC_CTYPE de_DE.ISO8859-1
elg!ene[~]> grep 
grep: trailing backslash (\)
elg!ene[~]> grep 
grep: trailing backslash (\)
elg!ene[~]> 


-- 
Eivind Nicolay Evensen