Re: What's the locale for system files (e.g. /etc/fstab)?

In reply to: Pau Amma : "Re: What's the locale for system files (e.g. /etc/fstab)?"
Go to: [ bottom of page ] [ top of archives ] [ this month ]

From: Warner Losh <imp_at_bsdimp.com>
Date: Fri, 25 Mar 2022 14:27:12 UTC

On Fri, Mar 25, 2022, 5:10 AM Pau Amma <pauamma@gundo.com> wrote:

> (pruned cc: to just the list)
>
> On 2022-03-25 04:08, Warner Losh wrote:
> > On Thu, Mar 24, 2022 at 2:51 PM Phil Shafer <phil@juniper.net> wrote:
> >
> >> On 24 Mar 2022, at 15:12, Warner Losh wrote:
> >> > That is the primary reason for system files always being C.UTF-8...
> >> > There is no way to tag it as anything else... and some of these files
> >> > are often parsed from a context that can't set the locale, like the
> >> > boot loader or the kernel... also, these files have a format that was
> >> > defined back in the 7bit ascii time frame. They also don't make use of
> >> > the text in a way that isn't literal...
> >>
> >> Exactly.  There's just no way to know in the current setup.  And
> >> declaring it UTF-8 will break anyone currently using locale-based
> >> values.  Using the symlink has the value of allowing a simple fix
> >> ("sudo
> >> ln -s $LANG /etc/locale").
> >
> > Except it's not a simple fix. Sure, you can find this value, but
> > nothing
> > will use it, necessarily. Since there's little value and little need, I
> > think it would be more hassle than it's worth absent a much more
> > extensive audit. For system wide things like config files, we assume
> > C.UTF-8 or the lessor ASCII-7 (or maybe ASCII-8).
>
> There's no ASCII-8. (If you meant 8859-*, there's 15 or 16, which
> essentially means "no".) Assuming ASCII (and therefore 7-bit) went out
> of style last millenium. Anything that expects or enforces something
> other than Unicode (which for all practical purposes means UTF-8) needs
> to be fixed urgently.
>

Ascii-8 here is just a sloppy shorthand for no multi byte character
support. All the parsing routines just look for certain fixed byte
separators for sequences of bytes. This will likely never change, but if it
does a lot of work to prove correctness needs to happen and all the things
that read these files would need to change.

UTF-8 works because it mostly avoids encodings that would get in the way of
this naive code since the encoding sequences can't have 7bit ascii values
in them and all the special characters are 7bit ascii.

Warner

-- 
> #BlackLivesMatter #TransWomenAreWomen #AccessibilityMatters
> #StandWithUkrainians
> English: he/him/his (singular they/them/their/theirs OK)
> French: il/le/lui (iel/iel and ielle/ielle OK)
> Tagalog: siya/niya/kaniya (please avoid sila/nila/kanila)
>
>