From nobody Fri Feb 03 14:26:17 2023 X-Original-To: stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4P7dFX3QFQz2pBTg for ; Fri, 3 Feb 2023 14:24:16 +0000 (UTC) (envelope-from eivinde@terraplane.org) Received: from smtp.domeneshop.no (smtp.domeneshop.no [IPv6:2a01:5b40:0:3006::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4P7dFX31Cqz4JT3 for ; Fri, 3 Feb 2023 14:24:16 +0000 (UTC) (envelope-from eivinde@terraplane.org) Authentication-Results: mx1.freebsd.org; none DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=terraplane.org; s=ds202212; h=Content-Transfer-Encoding:Content-Type: MIME-Version:References:In-Reply-To:Message-ID:Subject:Cc:To:From:Date:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=0GWmlsbu13ooZJtIWG4oGp/ZcpF5LeYSBXTEXW4Oa7U=; b=rZ6IdxdxlBYh/6n668onoFVi8z cSRBlteFQau1+4SVobdXwXH4cL4S2/Nk+rGcyqwGm6ydxj7haDL4BQHjrMQVb1eXrxHLKnJthlWMV ref9l9c7/bDna4Ifaz2Yf3jXVd/VbDZ7qlKrFgN+SNAZzbReGdZlizbZtZjMlVIhCulE7idBY7yuv p+H53mqTApLpmC8cbFBna4pO++JEcqVkAhnkgAEZZ+74XjlNotBCURIXMC77LrjwLvGPlfL1R/G3y t2zaOIFAcY+NkZ7Bm8DNmkMf02jmGKaW3M8jiMF5S+5FKcvDLmUX2KDUR9Lzgp0ZQhqrtn6mm1Pji tgceCoTQ==; Received: from ti0027q160-0136.bb.online.no ([37.200.21.137]:26271 helo=elg.hjerdalen.lokalnett) by smtp.domeneshop.no with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1pNwz8-003TTT-Mm; Fri, 03 Feb 2023 15:24:14 +0100 Date: Fri, 3 Feb 2023 15:26:17 +0100 From: Eivind Nicolay Evensen To: Tomoaki AOKI Cc: stable@freebsd.org Subject: Re: Grep with non-ascii Message-ID: <20230203152617.00e01686@elg.hjerdalen.lokalnett> In-Reply-To: <20230203203948.23d66303bcae8c528202071a@dec.sakura.ne.jp> References: <20230203110642.70e4a076@elg.hjerdalen.lokalnett> <20230203203948.23d66303bcae8c528202071a@dec.sakura.ne.jp> List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4P7dFX31Cqz4JT3 X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:12996, ipnet:2a01:5b40::/48, country:NO] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N Den Fri, 3 Feb 2023 20:39:48 +0900 skrev Tomoaki AOKI : > On Fri, 3 Feb 2023 11:06:42 +0100 > Eivind Nicolay Evensen wrote: > > > Hello. > > > > I just noticed this today: > > > > elg!ene[~]> printf "bø\nhei\nøl\n" | grep ø > > grep: trailing backslash (\) > > elg!ene[~]> echo $LC_CTYPE $LANG > > nb_NO.ISO8859-1 nb_NO.ISO8859-1 > > > > While I have the result I envisioned with gnugrep: > > > > elg!ene[~]> printf "bø\nhei\nøl\n" | ggrep ø > > bø > > øl > > > > Also, on OpenIndiana, linux and Netbsd, grep gives the proper > > result. > > > > Is lib/libc/regex the right place to look into this if I > > find the time, or does anybody know this enough to know the > > problem? > > > > Regards > > -- > > Eivind Nicolay Evensen > > Possibly a locale problem, or depending on what command line shell you > are using. > > Tried copy/pasting to command line, I got the result below. > > % printf "bø\nhei\nøl\n" | grep ø > bø > øl > > I'm using LC_ALL=ja_JP.UTF-8, LANG=ja_JP.UTF-8 as locale and > shells/zsh as command line shell. > > What happenes if you switch locale to nb_NO.UTF-8? > Indeed seems like a locale problem, because it works when I change it: elg!ene[~]> grep ø grep: trailing backslash (\) (i select UTF-8 encoding in the xterm menu here) elg!ene[~]> setenv LC_CTYPE nb_NO.UTF-8 elg!ene[~]> grep ø zzz æøå æøå ^D Perhaps for more of them, I just tried this (back to non-utf8 encoding in xterm): elg!ene[~]> setenv LC_CTYPE sv_SE.ISO8859-1 elg!ene[~]> grep grep: trailing backslash (\) and elg!ene[~]> setenv LC_CTYPE de_DE.ISO8859-1 elg!ene[~]> grep grep: trailing backslash (\) elg!ene[~]> grep grep: trailing backslash (\) elg!ene[~]> -- Eivind Nicolay Evensen