From nobody Thu Feb 09 12:18:47 2023 X-Original-To: stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4PCGBD4mTqz3nmj3 for ; Thu, 9 Feb 2023 12:19:00 +0000 (UTC) (envelope-from junchoon@dec.sakura.ne.jp) Received: from www121.sakura.ne.jp (www121.sakura.ne.jp [153.125.133.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4PCGBC0G17z3Qr7 for ; Thu, 9 Feb 2023 12:18:58 +0000 (UTC) (envelope-from junchoon@dec.sakura.ne.jp) Authentication-Results: mx1.freebsd.org; dkim=none; spf=none (mx1.freebsd.org: domain of junchoon@dec.sakura.ne.jp has no SPF policy when checking 153.125.133.21) smtp.mailfrom=junchoon@dec.sakura.ne.jp; dmarc=none Received: from kalamity.joker.local (123-1-88-210.area1b.commufa.jp [123.1.88.210]) (authenticated bits=0) by www121.sakura.ne.jp (8.16.1/8.16.1/[SAKURA-WEB]/20201212) with ESMTPA id 319CImRa028318 for ; Thu, 9 Feb 2023 21:18:48 +0900 (JST) (envelope-from junchoon@dec.sakura.ne.jp) Date: Thu, 9 Feb 2023 21:18:47 +0900 From: Tomoaki AOKI To: stable@freebsd.org Subject: Re: Grep with non-ascii Message-Id: <20230209211847.0fed55594c35ccde94b96221@dec.sakura.ne.jp> In-Reply-To: <202302090224.3192OEta077155@dell.no.berklix.net> References: <202302090224.3192OEta077155@dell.no.berklix.net> Organization: Junchoon corps X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; amd64-portbld-freebsd13.0) List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spamd-Result: default: False [-1.56 / 15.00]; AUTH_NA(1.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-0.999]; NEURAL_HAM_SHORT(-0.96)[-0.965]; MV_CASE(0.50)[]; MIME_GOOD(-0.10)[text/plain]; R_DKIM_NA(0.00)[]; ASN(0.00)[asn:7684, ipnet:153.125.128.0/18, country:JP]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; R_SPF_NA(0.00)[no SPF record]; MLMMJ_DEST(0.00)[stable@freebsd.org]; RCVD_TLS_LAST(0.00)[]; HAS_ORG_HEADER(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; FROM_HAS_DN(0.00)[]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; DMARC_NA(0.00)[sakura.ne.jp]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[stable@freebsd.org]; RCPT_COUNT_ONE(0.00)[1]; MID_RHS_MATCH_FROM(0.00)[] X-Rspamd-Queue-Id: 4PCGBC0G17z3Qr7 X-Spamd-Bar: - X-ThisMailContainsUnwantedMimeParts: N On Thu, 09 Feb 2023 03:24:14 +0100 "Julian H. Stacey" wrote: > > The one positive development in the world of computing that I would > > credit to Java is the earliest big push toward the adoption of UTF-8. > > I strongly hope UTF-8 becomes universally used sooner rather than > > later. -- George > > No idea What might be best for Arabic, Greek, Japanese etc: But > > For international English (& Italian where English font started) > it's wrong to expect masses of people OK with Ascii, to waste time > extending / learning / configuring tools for un-necessary UTF. > > Bad enough were single bytes above 0x7f for European accents (eg > umlauts etc) that ignored conventions eg Ae Oe Ue (& SS = sharf > ess since dumped in .de). > > USD GBP EUR avoid dodgey currency symbols `$` & `#` etc. > > UTF & HTML & MIME base 64 make spam filtering via procmail a nightmare. > UTF is a spam indicator, most auto discarded here. > > ports/textproc/mgdiff was last to break here, > Umlauts changed to Ascii, better than changing mgdiff. > > Cheers, > -- > Julian Stacey www.StolenVotes.UK/jhs/ Arm Ukraine, Zap Putin. Brexit broke UK IIUC, the 7bits part of UTF-8 100% matches 7bits part of ASCII. So it would be harmless to at least 7bits-ASCII-only users. But users who wants 8bits part (graphic characters and so on) and softwares which don't allow 8bits characters would be affected. TRON code is much different (basic character unit is 2*n bytes), but it's not at all supported/used in FreeBSD. Furthermore, FreeBSD already defaults to C.UTF-8 at Nov.14, 2020. [1] The actual commit is [2]. All reviewers listed in [1] approved the change. Note that this is NOT MFC'ed to stable/12 and before, although all 13.x has it (13.0 is released at Apr. 13,2021). [1] https://reviews.freebsd.org/D26973 [2] https://cgit.freebsd.org/src/commit/usr.bin/login/login.conf?id=09ef995baf45333d45ab214daf8c03e1a25f8fcc -- Tomoaki AOKI