From nobody Fri Feb 03 11:39:48 2023 X-Original-To: stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4P7Ybv0Jsfz2p1Kx for ; Fri, 3 Feb 2023 11:39:55 +0000 (UTC) (envelope-from junchoon@dec.sakura.ne.jp) Received: from www121.sakura.ne.jp (www121.sakura.ne.jp [153.125.133.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4P7Ybs4NLlz3k61 for ; Fri, 3 Feb 2023 11:39:53 +0000 (UTC) (envelope-from junchoon@dec.sakura.ne.jp) Authentication-Results: mx1.freebsd.org; dkim=none; spf=none (mx1.freebsd.org: domain of junchoon@dec.sakura.ne.jp has no SPF policy when checking 153.125.133.21) smtp.mailfrom=junchoon@dec.sakura.ne.jp; dmarc=none Received: from kalamity.joker.local (123-1-88-210.area1b.commufa.jp [123.1.88.210]) (authenticated bits=0) by www121.sakura.ne.jp (8.16.1/8.16.1/[SAKURA-WEB]/20201212) with ESMTPA id 313Bdm7f015287 for ; Fri, 3 Feb 2023 20:39:49 +0900 (JST) (envelope-from junchoon@dec.sakura.ne.jp) Date: Fri, 3 Feb 2023 20:39:48 +0900 From: Tomoaki AOKI To: stable@freebsd.org Subject: Re: Grep with non-ascii Message-Id: <20230203203948.23d66303bcae8c528202071a@dec.sakura.ne.jp> In-Reply-To: <20230203110642.70e4a076@elg.hjerdalen.lokalnett> References: <20230203110642.70e4a076@elg.hjerdalen.lokalnett> Organization: Junchoon corps X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; amd64-portbld-freebsd13.0) List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spamd-Result: default: False [-1.58 / 15.00]; AUTH_NA(1.00)[]; NEURAL_HAM_SHORT(-1.00)[-0.999]; NEURAL_HAM_LONG(-1.00)[-0.999]; NEURAL_HAM_MEDIUM(-0.98)[-0.979]; MV_CASE(0.50)[]; MIME_GOOD(-0.10)[text/plain]; R_DKIM_NA(0.00)[]; ASN(0.00)[asn:7684, ipnet:153.125.128.0/18, country:JP]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; R_SPF_NA(0.00)[no SPF record]; MLMMJ_DEST(0.00)[stable@freebsd.org]; RCVD_TLS_LAST(0.00)[]; HAS_ORG_HEADER(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; FROM_HAS_DN(0.00)[]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; DMARC_NA(0.00)[sakura.ne.jp]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[stable@freebsd.org]; RCPT_COUNT_ONE(0.00)[1]; MID_RHS_MATCH_FROM(0.00)[] X-Rspamd-Queue-Id: 4P7Ybs4NLlz3k61 X-Spamd-Bar: - X-ThisMailContainsUnwantedMimeParts: N On Fri, 3 Feb 2023 11:06:42 +0100 Eivind Nicolay Evensen wrote: > Hello. > > I just noticed this today: > > elg!ene[~]> printf "bø\nhei\nøl\n" | grep ø > grep: trailing backslash (\) > elg!ene[~]> echo $LC_CTYPE $LANG > nb_NO.ISO8859-1 nb_NO.ISO8859-1 > > While I have the result I envisioned with gnugrep: > > elg!ene[~]> printf "bø\nhei\nøl\n" | ggrep ø > bø > øl > > Also, on OpenIndiana, linux and Netbsd, grep gives the proper result. > > Is lib/libc/regex the right place to look into this if I > find the time, or does anybody know this enough to know the > problem? > > Regards > -- > Eivind Nicolay Evensen Possibly a locale problem, or depending on what command line shell you are using. Tried copy/pasting to command line, I got the result below. % printf "bø\nhei\nøl\n" | grep ø bø øl I'm using LC_ALL=ja_JP.UTF-8, LANG=ja_JP.UTF-8 as locale and shells/zsh as command line shell. What happenes if you switch locale to nb_NO.UTF-8? -- Tomoaki AOKI