From nobody Fri Feb 03 17:36:47 2023 X-Original-To: stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4P7jWx3L1Jz3kYjK for ; Fri, 3 Feb 2023 17:37:01 +0000 (UTC) (envelope-from george+freebsd@m5p.com) Received: from mailhost.m5p.com (mailhost.m5p.com [74.104.188.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "m5p.com", Issuer "R3" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4P7jWw4yq3z46tc for ; Fri, 3 Feb 2023 17:37:00 +0000 (UTC) (envelope-from george+freebsd@m5p.com) Authentication-Results: mx1.freebsd.org; dkim=none; spf=pass (mx1.freebsd.org: domain of george+freebsd@m5p.com designates 74.104.188.4 as permitted sender) smtp.mailfrom=george+freebsd@m5p.com; dmarc=none Received: from [IPV6:2001:470:1f07:15ff::26] (court.m5p.com [IPv6:2001:470:1f07:15ff:0:0:0:26]) (authenticated bits=0) by mailhost.m5p.com (8.16.1/8.15.2) with ESMTPSA id 313HalPH033132 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO) for ; Fri, 3 Feb 2023 12:36:53 -0500 (EST) (envelope-from george+freebsd@m5p.com) Message-ID: Date: Fri, 3 Feb 2023 12:36:47 -0500 List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:102.0) Gecko/20100101 Thunderbird/102.4.0 Subject: Re: Grep with non-ascii To: stable@freebsd.org References: <20230203110642.70e4a076@elg.hjerdalen.lokalnett> <819a4336-9689-bdbe-a90d-8f1d7b842662@grosbein.net> <20230203151853.02732bd6@elg.hjerdalen.lokalnett> <20230204010605.4874609f80eed28543407807@dec.sakura.ne.jp> Content-Language: en-US From: George Mitchell In-Reply-To: <20230204010605.4874609f80eed28543407807@dec.sakura.ne.jp> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=0.0 required=10.0 tests=HELO_NO_DOMAIN,NICE_REPLY_A autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on mattapan.m5p.com X-Spamd-Result: default: False [-3.29 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-0.997]; NEURAL_HAM_SHORT(-1.00)[-0.996]; R_SPF_ALLOW(-0.20)[+a]; MIME_GOOD(-0.10)[text/plain]; R_DKIM_NA(0.00)[]; TAGGED_FROM(0.00)[freebsd]; MLMMJ_DEST(0.00)[stable@freebsd.org]; FROM_EQ_ENVFROM(0.00)[]; ASN(0.00)[asn:701, ipnet:74.104.0.0/16, country:US]; MIME_TRACE(0.00)[0:+]; DMARC_NA(0.00)[m5p.com]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; RCVD_TLS_ALL(0.00)[]; FROM_HAS_DN(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[stable@freebsd.org]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-Rspamd-Queue-Id: 4P7jWw4yq3z46tc X-Spamd-Bar: --- X-ThisMailContainsUnwantedMimeParts: N On 2/3/23 11:06, Tomoaki AOKI wrote: > [...] > If this is the case like above, the only solution is to move to > character set containing ALL characters all over the world. > > AFAIK, the only candidates are only two, TRON code [1] and Unicode (UCS, > ISO/IEC 10646) [2]. And TRON code is very rarely used, actual candidate > would be Unicode only. > Note that Unicode is usually encoded to any of UTF-8, UTF-16 or UTF-32 > for data transfer (sometimes raw UCS-2?). > [...] The one positive development in the world of computing that I would credit to Java is the earliest big push toward the adoption of UTF-8. I strongly hope UTF-8 becomes universally used sooner rather than later. -- George