From nobody Sat Feb 04 13:36:02 2023 X-Original-To: stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4P8D7f1S2Bz3nPLL for ; Sat, 4 Feb 2023 13:36:14 +0000 (UTC) (envelope-from george+freebsd@m5p.com) Received: from mailhost.m5p.com (mailhost.m5p.com [74.104.188.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "m5p.com", Issuer "R3" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4P8D7d39g7z3pMS for ; Sat, 4 Feb 2023 13:36:13 +0000 (UTC) (envelope-from george+freebsd@m5p.com) Authentication-Results: mx1.freebsd.org; dkim=none; spf=pass (mx1.freebsd.org: domain of george+freebsd@m5p.com designates 74.104.188.4 as permitted sender) smtp.mailfrom=george+freebsd@m5p.com; dmarc=none Received: from [IPV6:2001:470:1f07:15ff::26] (court.m5p.com [IPv6:2001:470:1f07:15ff:0:0:0:26]) (authenticated bits=0) by mailhost.m5p.com (8.16.1/8.15.2) with ESMTPSA id 314Da2MA037436 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO) for ; Sat, 4 Feb 2023 08:36:08 -0500 (EST) (envelope-from george+freebsd@m5p.com) Message-ID: Date: Sat, 4 Feb 2023 08:36:02 -0500 List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:102.0) Gecko/20100101 Thunderbird/102.4.0 Subject: Re: Grep with non-ascii To: stable@freebsd.org References: <20230203110642.70e4a076@elg.hjerdalen.lokalnett> <819a4336-9689-bdbe-a90d-8f1d7b842662@grosbein.net> <20230203151853.02732bd6@elg.hjerdalen.lokalnett> <20230204010605.4874609f80eed28543407807@dec.sakura.ne.jp> <20230204131637.4e8e66e086eea57f4bb27b12@dec.sakura.ne.jp> Content-Language: en-US From: George Mitchell In-Reply-To: <20230204131637.4e8e66e086eea57f4bb27b12@dec.sakura.ne.jp> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=0.0 required=10.0 tests=HELO_NO_DOMAIN,NICE_REPLY_A autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on mattapan.m5p.com X-Spamd-Result: default: False [-3.27 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.97)[-0.970]; R_SPF_ALLOW(-0.20)[+a]; MIME_GOOD(-0.10)[text/plain]; R_DKIM_NA(0.00)[]; TAGGED_FROM(0.00)[freebsd]; MLMMJ_DEST(0.00)[stable@freebsd.org]; FROM_EQ_ENVFROM(0.00)[]; ASN(0.00)[asn:701, ipnet:74.104.0.0/16, country:US]; MIME_TRACE(0.00)[0:+]; DMARC_NA(0.00)[m5p.com]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; RCVD_TLS_ALL(0.00)[]; FROM_HAS_DN(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[stable@freebsd.org]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-Rspamd-Queue-Id: 4P8D7d39g7z3pMS X-Spamd-Bar: --- X-ThisMailContainsUnwantedMimeParts: N On 2/3/23 23:16, Tomoaki AOKI wrote: > [...] > And FreeBSD already has UTF-8. ;-) > > Drawbacks of UTF-8 are... > *Han unification. Not exactly same but lookalike characters in > Japanese, Chinese and Korean are fatally missingly unified. > > *Lack of proper support for variant forms of characters. > Maybe Unicode should have another 2 dimensions, one for classifying > wrongly unified CJK characters and another one for variants. I confess that I don't know enough to comment on those. > > *Font sets. Very limited number of fonts covers the whole > Unicode codepoints that are assigned any of actual character. > > *FreeBSD base does not have full Unicode font for vt yet. > (Input methods are the different problem, though.) > Yes, but FreeBSD is making progress on remedying these problems. Many fonts DO have support for the codepoints I need, though. I think these are less of a problem than the problems that UTF-8 solves. -- George