From nobody Sat Feb 04 09:47:27 2023 X-Original-To: freebsd-stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4P871M6TQ6z3mvGS for ; Sat, 4 Feb 2023 09:45:27 +0000 (UTC) (envelope-from eivinde@terraplane.org) Received: from smtp.domeneshop.no (smtp.domeneshop.no [IPv6:2a01:5b40:0:3006::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4P871M4Nvzz4bbX for ; Sat, 4 Feb 2023 09:45:27 +0000 (UTC) (envelope-from eivinde@terraplane.org) Authentication-Results: mx1.freebsd.org; none DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=terraplane.org; s=ds202212; h=Content-Transfer-Encoding:Content-Type: MIME-Version:References:In-Reply-To:Message-ID:Subject:Cc:To:From:Date:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=eblhaI9wT94S1qBo0AKIuZLBRpAokY03cXeRQ1FM+z0=; b=IaJa4AsAIcSgwi6GsOckMdSL6v H55auA8lzn+X5Np27IDzjPrBN4fa9RaDbrO0i5MiUyZiu6sBbDgiz2Sg/3D/AUQFWO1+0zqrDShBQ jzwjCOUMtNTaTxcXZpMxNDpsBc09Ks31Jmy6B0UEtR8HmgWh1gLu+ngdw7tP8M9oRACCC9TlSkgNt Z/SSVm0JPQc1bL5N1hxIlBJg7ClfDbZKwkSy7cv8MMj4LsV5XF6Ykzi+p7NTpNzHI1DIxBzv2t0sL 8PNF55xe5TzRrL0gEPCtnxeILam5m1TKg2pU3AEjoGUcrUNx7tKlnOKUOinUT4p9F4xd3VBf8rafS JtXFVbfg==; Received: from ti0027q160-0136.bb.online.no ([37.200.21.137]:38673 helo=elg.hjerdalen.lokalnett) by smtp.domeneshop.no with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1pOF6p-006wP6-5h; Sat, 04 Feb 2023 10:45:23 +0100 Date: Sat, 4 Feb 2023 10:47:27 +0100 From: Eivind Nicolay Evensen To: Eugene Grosbein Cc: freebsd-stable@freebsd.org Subject: Re: Grep with non-ascii Message-ID: <20230204104727.72bb3715@elg.hjerdalen.lokalnett> In-Reply-To: References: <20230203110642.70e4a076@elg.hjerdalen.lokalnett> <819a4336-9689-bdbe-a90d-8f1d7b842662@grosbein.net> <20230203151853.02732bd6@elg.hjerdalen.lokalnett> List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4P871M4Nvzz4bbX X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:12996, ipnet:2a01:5b40::/48, country:NO] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N Den Sat, 4 Feb 2023 08:41:17 +0700 skrev Eugene Grosbein : > 03.02.2023 21:18, Eivind Nicolay Evensen wrote: > > > Den Fri, 3 Feb 2023 19:12:32 +0700 > > skrev Eugene Grosbein : > > > >> 03.02.2023 17:06, Eivind Nicolay Evensen wrote: > >>> Hello. > >>> > >>> I just noticed this today: > >>> > >>> elg!ene[~]> printf "bø\nhei\nøl\n" | grep ø > >>> grep: trailing backslash (\) > >>> elg!ene[~]> echo $LC_CTYPE $LANG > >>> nb_NO.ISO8859-1 nb_NO.ISO8859-1 > >>> > >>> While I have the result I envisioned with gnugrep: > >>> > >>> elg!ene[~]> printf "bø\nhei\nøl\n" | ggrep ø > >>> bø > >>> øl > >>> > >>> Also, on OpenIndiana, linux and Netbsd, grep gives the proper > >>> result. > >>> > >>> Is lib/libc/regex the right place to look into this if I > >>> find the time, or does anybody know this enough to know the > >>> problem? > >> > >> Try single quotes instead of double quotes. > >> And pleace specify system version and shell name, and shell version > >> if its not in base system. > > > > This is > > elg!ene[~]> uname -a > > FreeBSD elg.hjerdalen.lokalnett 13.2-PRERELEASE FreeBSD > > 13.2-PRERELEASE #1: Tue Jan 31 11:23:29 CET 2023 > > ene@elg.hjerdalen.lokalnett:/usr/obj/usr/src/amd64.amd64/sys/ENE-spurv > > amd64 > > > > Using the tcsh that comes with it. But I don't think the quotes > > matter much because of this: > > > > elg!ene[~]> grep ø > > grep: trailing backslash (\) > > > > The output was more just to have something to look for, like > > with ggrep but anyway: > > > > elg!ene[~]> printf 'bø\nhei\nøl\n' |grep ø > > grep: trailing backslash (\) > > > > And obviously: > > > > elg!ene[~]> printf 'bø\nhei\nøl\n' > > bø > > hei > > øl > > > > And it seems to be the same for any 8859-1 character not part > > of ascii: > > > > elg!ene[~]> grep ä > > grep: trailing backslash (\) > > elg!ene[~]> grep ß > > grep: trailing backslash (\) > > elg!ene[~]> grep ç > > grep: trailing backslash (\) > > I checked it with ru_RU.KOI8-R locale and same problem manifested, > with every Cyrillic letter. The following line shows codes and > characters of affected positions in last half of 8-bit character > table. > > $ jot -w '%o' - 128 255 1 | xargs -n2 -I^ printf '^ \^\n' | while > read octal char; do grep -q "$char" /etc/motd 2>/dev/null; [ $? -gt 1 > ] && echo $octal $char; done > > Note that this problem does not exist in 12.4 or earlier FreeBSD > versions, so this is recent regression. Surely that's due to grep > command being GNU grep in 12.4 but BSD grep in 13.x That makes sense, since I know for certain I have grepped for Norwegian words containing æøå without seeing this problem before. And I switched from 11 to 13 very late, and only because I wanted to use hardware unsupported by the old one, so that would explain why it took me so long to discover. -- Eivind Nicolay Evensen