Re: Grep with non-ascii

From: Tomoaki AOKI <>
Date: Thu, 09 Feb 2023 12:18:47 UTC
On Thu, 09 Feb 2023 03:24:14 +0100
"Julian H. Stacey" <> wrote:

> > The one positive development in the world of computing that I would
> > credit to Java is the earliest big push toward the adoption of UTF-8.
> > I strongly hope UTF-8 becomes universally used sooner rather than
> > later.                                                     -- George
> No idea What might be best for Arabic, Greek, Japanese etc: But
> For international English (& Italian where English font started)
> it's wrong to expect masses of people OK with Ascii, to waste time
> extending / learning / configuring tools for un-necessary UTF.
> Bad enough were single bytes above 0x7f for European accents (eg
> umlauts etc) that ignored conventions eg Ae Oe Ue (& SS = sharf
> ess since dumped in .de).  
> USD GBP EUR avoid dodgey currency symbols `$` & `#` etc.
> UTF & HTML & MIME base 64 make spam filtering via procmail a nightmare.
> UTF is a spam indicator, most auto discarded here.
> ports/textproc/mgdiff was last to break here,
> Umlauts changed to Ascii, better than changing mgdiff.
> Cheers,
> -- 
> Julian Stacey www.StolenVotes.UK/jhs/ Arm Ukraine, Zap Putin.  Brexit broke UK

IIUC, the 7bits part of UTF-8 100% matches 7bits part of ASCII.
So it would be harmless to at least 7bits-ASCII-only users.
But users who wants 8bits part (graphic characters and so on) and
softwares which don't allow 8bits characters would be affected. 
TRON code is much different (basic character unit is 2*n bytes), but
it's not at all supported/used in FreeBSD.

Furthermore, FreeBSD already defaults to C.UTF-8 at Nov.14, 2020. [1]
The actual commit is [2].
All reviewers listed in [1] approved the change.

Note that this is NOT MFC'ed to stable/12 and before, although all 13.x
has it (13.0 is released at Apr. 13,2021).


Tomoaki AOKI    <>