From nobody Sat Feb 04 04:16:37 2023 X-Original-To: stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4P7zk22DK3z3kfCj for ; Sat, 4 Feb 2023 04:16:42 +0000 (UTC) (envelope-from junchoon@dec.sakura.ne.jp) Received: from www121.sakura.ne.jp (www121.sakura.ne.jp [153.125.133.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4P7zk069R2z4GMw for ; Sat, 4 Feb 2023 04:16:40 +0000 (UTC) (envelope-from junchoon@dec.sakura.ne.jp) Authentication-Results: mx1.freebsd.org; dkim=none; spf=none (mx1.freebsd.org: domain of junchoon@dec.sakura.ne.jp has no SPF policy when checking 153.125.133.21) smtp.mailfrom=junchoon@dec.sakura.ne.jp; dmarc=none Received: from kalamity.joker.local (123-1-88-210.area1b.commufa.jp [123.1.88.210]) (authenticated bits=0) by www121.sakura.ne.jp (8.16.1/8.16.1/[SAKURA-WEB]/20201212) with ESMTPA id 3144Gc6t048408 for ; Sat, 4 Feb 2023 13:16:38 +0900 (JST) (envelope-from junchoon@dec.sakura.ne.jp) Date: Sat, 4 Feb 2023 13:16:37 +0900 From: Tomoaki AOKI To: stable@freebsd.org Subject: Re: Grep with non-ascii Message-Id: <20230204131637.4e8e66e086eea57f4bb27b12@dec.sakura.ne.jp> In-Reply-To: References: <20230203110642.70e4a076@elg.hjerdalen.lokalnett> <819a4336-9689-bdbe-a90d-8f1d7b842662@grosbein.net> <20230203151853.02732bd6@elg.hjerdalen.lokalnett> <20230204010605.4874609f80eed28543407807@dec.sakura.ne.jp> Organization: Junchoon corps X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; amd64-portbld-freebsd13.0) List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spamd-Result: default: False [-1.60 / 15.00]; AUTH_NA(1.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-0.999]; MV_CASE(0.50)[]; MIME_GOOD(-0.10)[text/plain]; R_DKIM_NA(0.00)[]; ASN(0.00)[asn:7684, ipnet:153.125.128.0/18, country:JP]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; R_SPF_NA(0.00)[no SPF record]; MLMMJ_DEST(0.00)[stable@freebsd.org]; RCVD_TLS_LAST(0.00)[]; HAS_ORG_HEADER(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; FROM_HAS_DN(0.00)[]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; DMARC_NA(0.00)[sakura.ne.jp]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[stable@freebsd.org]; RCPT_COUNT_ONE(0.00)[1]; MID_RHS_MATCH_FROM(0.00)[] X-Rspamd-Queue-Id: 4P7zk069R2z4GMw X-Spamd-Bar: - X-ThisMailContainsUnwantedMimeParts: N On Fri, 3 Feb 2023 12:36:47 -0500 George Mitchell wrote: > On 2/3/23 11:06, Tomoaki AOKI wrote: > > [...] > > If this is the case like above, the only solution is to move to > > character set containing ALL characters all over the world. > > > > AFAIK, the only candidates are only two, TRON code [1] and Unicode (UCS, > > ISO/IEC 10646) [2]. And TRON code is very rarely used, actual candidate > > would be Unicode only. > > Note that Unicode is usually encoded to any of UTF-8, UTF-16 or UTF-32 > > for data transfer (sometimes raw UCS-2?). > > [...] > > The one positive development in the world of computing that I would > credit to Java is the earliest big push toward the adoption of UTF-8. > I strongly hope UTF-8 becomes universally used sooner rather than > later. -- George And FreeBSD already has UTF-8. ;-) Drawbacks of UTF-8 are... *Han unification. Not exactly same but lookalike characters in Japanese, Chinese and Korean are fatally missingly unified. *Lack of proper support for variant forms of characters. Maybe Unicode should have another 2 dimensions, one for classifying wrongly unified CJK characters and another one for variants. *Font sets. Very limited number of fonts covers the whole Unicode codepoints that are assigned any of actual character. *FreeBSD base does not have full Unicode font for vt yet. (Input methods are the different problem, though.) -- Tomoaki AOKI