[RFC] Set the default locale to en_US.UTF-8

Peter Jeremy peter at rulingia.com
Sun Jan 25 19:00:11 UTC 2015


On 2015-Jan-25 18:50:00 +0300, Slawa Olhovchenkov <slw at zxy.spb.ru> wrote:
>On Sun, Jan 25, 2015 at 06:58:13AM -0800, Jordan Hubbard wrote:
>> > On Jan 25, 2015, at 6:32 AM, Slawa Olhovchenkov <slw at zxy.spb.ru> wrote:
>> > 
>> > NO! Please, NOT!
>> > Not all bytestring allowed in UTF-8, as result -- unpedicable failed
>> > execution of sed, grep, vi, ed and etc.

I switched to en_AU.UTF-8 about 5 years ago with relatively little pain
(though I had very little non-ASCII text).

The downside of UTF-8 in that random non-ASCII bytestrings are unlikely to
be valid UTF-8 and will therefore get rejected.  About the only time I get
bitten by this is that my random password generator:
  dd if=/dev/random bs=32 count=1 | tr -cd '!-~'
will die with an "tr: Illegal byte sequence" and needs a "LC_ALL=C" to
placate it.

At least with emacs (and I think vi), you can override the default locale
on a file-by-file basis - and emacs is very good at coping with non-UTF-8
files in a UTF-8 locale, as well as translating between locales.

>> It's a good idea to change it.  We have outgrown ISO-Latin1, and UTF-8 solves a host of ugly I18N interoperability problems when used consistently.

Agreed.  IMHO, this is long overdue.

>I am years use ru_RU.KOI8-R. Now I try use ru_RU.UTF8 and got some
>issuse (on 10-STABLE). 9.x and OS may have dufferent version of
>software and don't touch this.

Once you've started using any 8-bit locale, switching to UTF-8 (or any
other 8-bit locale) will be a PITA because you need to re-encode everything.
And, since it's very difficult to run with multiple locales, you need to
do a complete sweep when you change locales.  If you are running into
specific issues with incorrect handling of ru_RU.UTF8, that is a bug and
you need to report it.

Note that we're talking about changing the default - you already override
the default so it won't affect you.

>This is (change from one-byte tu multi-bytes locale) may be do
>individualy, after inspecting systems. This is may be OK for new
>install, but not [automatic] for update/upgrade.

Either an existing system has already overridden the default locale, so
changing the default will have no impact, or the treatment of non-ASCII
data is currently undefined so changing the default is changing undefined
behaviour to explicitly warning the the user that they have problems with
their data.

-- 
Peter Jeremy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 949 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20150126/19ed162f/attachment.sig>


More information about the freebsd-arch mailing list