From nobody Thu Mar 24 15:31:33 2022 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 00F1C1A3EA1A for ; Thu, 24 Mar 2022 15:31:53 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-ua1-x92d.google.com (mail-ua1-x92d.google.com [IPv6:2607:f8b0:4864:20::92d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4KPTjM6LPRz4c08 for ; Thu, 24 Mar 2022 15:31:51 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-ua1-x92d.google.com with SMTP id b37so2179001uad.12 for ; Thu, 24 Mar 2022 08:31:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=PSPHqmKQP6eOg38MOo5mDQRr+7xhdqE1cwQdy+AZml8=; b=HvcNqOccaQLyJpioHgmQzdRjB/ne6Z5BNNLKsAs0ryKVF/HyHsqHJ1eh07Qs0j35xw A6N1JMk5qlAj6xQAD/DTQOw1aYKPockjM9lZzLFF0VOMGYw/89v+j5+CMQWLYmFE1nP3 snA4XN8ynnUFKDxaVEjf6MzUtTKdgyo/7il5e3QoLoYaDUB5h72233vFTcw/1Of0NU3L Ad/Rv9m3TIaHcKWj5Fad48A7gLYS8JlOzrUIclaV1TsAqcxrIRs+xIc46U55CbVKOrRN DwYGgrG1064021svfDNrAy1JxcEt4SFHRSPYCdIJDBCE9ZWn156HhKJQYrCI1uvdlwsO y1CA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=PSPHqmKQP6eOg38MOo5mDQRr+7xhdqE1cwQdy+AZml8=; b=KsMa5Haey2/oAFUjFbxhmGNJho5vmxwTEGkNyrAMYIOnedLGhbjndk+2kI6KjNzS/H 2hQZdH8QsJQ4tGcLR5ZvI7D8L6w/GwljZwdgsEZnLHiEulZ0j6W3OuWyY+oBFfj1Zg2y TaDzOZxaB4BeSxXf83LGiQciaifZ7B6IxpC48oTQCgZbi33Xvdhc41TE0SU96hK/ydM+ cz7iJhZrC5fzkCYya6vMTrPDPYYHKRWG3Skw7DxrQ5kq5xS9QwNm55VPvuzLhgkilglY SaABs0sc4tBxbCKccv8FbFdc6MDqXlZUxx2VFs/xHCXWkFbnth0gMWeRKcDnCeCoDyAE xX+A== X-Gm-Message-State: AOAM531GTP9oe6xg+wa5Mhx80MT74AwRpT+nZZ8RwTSkhpIRZRU4cAp7 v1Kk9TqGp32/RTcmAkL6ssqgh9Yum3vub0wkCDryNx17UUQ= X-Google-Smtp-Source: ABdhPJxpSPWjFWQEYVt7RkOavZNPaPUcHOE0q2ImWkkkHeVCxlL6zAipLYfxn2V3IY9ARBoKB+/5sRCsta63A19A1Y8= X-Received: by 2002:ab0:6804:0:b0:33c:6fe1:3266 with SMTP id z4-20020ab06804000000b0033c6fe13266mr2651982uar.91.1648135905426; Thu, 24 Mar 2022 08:31:45 -0700 (PDT) List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 References: <70B211BB-15BA-47A4-8F9C-C833AA8C1EAA@freebsd.org> <202203241519.22OFJ3Mk098649@gndrsh.dnsmgr.net> In-Reply-To: <202203241519.22OFJ3Mk098649@gndrsh.dnsmgr.net> From: Warner Losh Date: Thu, 24 Mar 2022 09:31:33 -0600 Message-ID: Subject: Re: What's the locale for system files (e.g. /etc/fstab)? To: "Rodney W. Grimes" Cc: Phil Shafer , FreeBSD Hackers , "Simon J. Gerraty" Content-Type: multipart/alternative; boundary="0000000000002f5cc205daf88b12" X-Rspamd-Queue-Id: 4KPTjM6LPRz4c08 X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; dkim=pass header.d=bsdimp-com.20210112.gappssmtp.com header.s=20210112 header.b=HvcNqOcc; dmarc=none; spf=none (mx1.freebsd.org: domain of wlosh@bsdimp.com has no SPF policy when checking 2607:f8b0:4864:20::92d) smtp.mailfrom=wlosh@bsdimp.com X-Spamd-Result: default: False [0.14 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.85)[-0.854]; R_DKIM_ALLOW(-0.20)[bsdimp-com.20210112.gappssmtp.com:s=20210112]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; DMARC_NA(0.00)[bsdimp.com]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[bsdimp-com.20210112.gappssmtp.com:+]; NEURAL_SPAM_LONG(0.99)[0.992]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::92d:from]; NEURAL_HAM_SHORT(-1.00)[-0.999]; MLMMJ_DEST(0.00)[freebsd-hackers]; FORGED_SENDER(0.30)[imp@bsdimp.com,wlosh@bsdimp.com]; R_SPF_NA(0.00)[no SPF record]; MIME_TRACE(0.00)[0:+,1:+,2:~]; SUBJECT_ENDS_QUESTION(1.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_NEQ_ENVFROM(0.00)[imp@bsdimp.com,wlosh@bsdimp.com]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-ThisMailContainsUnwantedMimeParts: N --0000000000002f5cc205daf88b12 Content-Type: text/plain; charset="UTF-8" On Thu, Mar 24, 2022, 9:20 AM Rodney W. Grimes < freebsd-rwg@gndrsh.dnsmgr.net> wrote: > > On 23 Mar 2022, at 11:51, Piotr Pawel Stefaniak wrote: > > > mount: make libxo support more locale-aware > > > > > > "special", "node", and "mounter" are not guaranteed to be encoded > > > with > > > UTF-8. Use the appropriate modifier. > > > > > > - xo_emit("{:special}{L: on }{:node}{L: (}{:fstype}", > > > sfp->f_mntfromname, > > > + xo_emit("{:special/%hs}{L: on }{:node/%hs}{L: (}{:fstype}", > > > sfp->f_mntfromname, > > sfp->f_mntonname, sfp->f_fstypename); > > > > This recent "mount" patch highlights a libxo-related problem for which I > > don't have a solution: > > > > There are several files for which the encoding is not known. Since > > locale is user specific, we don't know how to interpret the contents of > > /etc/fstab. It's assumably been encoded with the format of the user who > > wrote it, but that information is lost. > > Since you say "locale is user specific" it makes me want to say that > this should come from the environment set by "default:" in /etc/login.conf, > no need for a new file or anything special. > Config files, like fstab, have no locale and parsing them with a locale leads to errors, even when the user or the system has a nondefault locale. > > > Put more generally, there's not a system-wide place which declares the > > encoding for system files, which leads to this problem where we > > interpret files from one user's locale using another user's locale. > > Well /etc/login.conf *IS* a system wide declaration of this type of > stuff, both lang= and charset= are declared there. > Since system wide files like yhese are always parsed without a locale, this information is correct, but I'm not sure how it applies. It is always C.UTF-8. Anything else may, or may not, work based on accidents of coincident encoding. Not everything can change locales, and the fstab and other parsing routines in libc assume C.UTF-8 or even just the ascii-7/8 subset. > > > One solution would a symlink in /etc that "points to" the name of the > > current system-wide locale name. > > > > % ls -Fl /etc/locale > > lrwxr-xr-x 1 root wheel 7 Mar 23 15:42 /etc/locale@ -> C.UTF-8 > > grep lang /etc/login.conf: > :lang=C.UTF-8: > :lang=ru_RU.UTF-8:\ > > Probably what you want? > You can get this with the locale routines, no? No need for grep. Warner > > > (Or "/etc/system.locale" ?) > > > > If the symlink doesn't exist, would "C.UTF-8" be a suitable default > > moving forwards? It certainly would not be backwards compatible, since > > an existing fstab could have non-UTF-8 strings in it, encoded with the > > locale of the user who touched the file. But there's really no > > backwards compatible solution, given that there's no guarantee that (for > > any specific FreeBSD system) all system files were written with the same > > locale. Fun, eh? ;^) > > > > Opinions, thoughts, please? > > > > Thanks, > > Phil > > > > > > -- > Rod Grimes > rgrimes@freebsd.org > > --0000000000002f5cc205daf88b12 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Thu, Mar 24, 2022, 9:20 AM Rodney W. Grimes <freebsd-rwg@gndrsh.dnsmgr.net= > wrote:
> On 23 Mar 2022= , at 11:51, Piotr Pawel Stefaniak wrote:
> > mount: make libxo support more locale-aware
> >
> >=C2=A0 =C2=A0 "special", "node", and "mou= nter" are not guaranteed to be encoded
> > with
> >=C2=A0 =C2=A0 UTF-8. Use the appropriate modifier.
> >
> > -=C2=A0 =C2=A0 =C2=A0 =C2=A0xo_emit("{:special}{L: on }{:nod= e}{L: (}{:fstype}",
> > sfp->f_mntfromname,
> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0xo_emit("{:special/%hs}{L: on }{= :node/%hs}{L: (}{:fstype}",
> > sfp->f_mntfromname,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 sfp->f_mntonname, s= fp->f_fstypename);
>
> This recent "mount" patch highlights a libxo-related problem= for which I
> don't have a solution:
>
> There are several files for which the encoding is not known.=C2=A0 Sin= ce
> locale is user specific, we don't know how to interpret the conten= ts of
> /etc/fstab.=C2=A0 It's assumably been encoded with the format of t= he user who
> wrote it, but that information is lost.

Since you say "locale is user specific" it makes me want to say t= hat
this should come from the environment set by "default:" in /etc/l= ogin.conf,
no need for a new file or anything special.

Config files, like fstab, have n= o locale and parsing them with a locale leads to errors, even when the user= or the system has a nondefault locale.=C2=A0

>
> Put more generally, there's not a system-wide place which declares= the
> encoding for system files, which leads to this problem where we
> interpret files from one user's locale using another user's lo= cale.

Well /etc/login.conf *IS* a system wide declaration of this type of
stuff, both lang=3D and charset=3D are declared there.

Since system wide fil= es like yhese are always parsed without a locale, this information is corre= ct, but I'm not sure how it applies.

<= div dir=3D"auto">It is always=C2=A0 C.UTF-8. Anything else may, or may not,= work based on accidents of coincident encoding. Not everything can change = locales, and the fstab and other parsing routines in libc assume C.UTF-8 or= even just the ascii-7/8 subset.

>
> One solution would a symlink in /etc that "points to" the na= me of the
> current system-wide locale name.
>
> % ls -Fl /etc/locale
> lrwxr-xr-x=C2=A0 1 root=C2=A0 wheel=C2=A0 7 Mar 23 15:42 /etc/locale@ = -> C.UTF-8

grep lang /etc/login.conf:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 :lang=3DC.UTF-8:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 :lang=3Dru_RU.UTF-8:\

Probably what you want?

<= /div>
You can get this with the locale routines, no? No ne= ed for grep.

Warner

<= blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px= #ccc solid;padding-left:1ex"> >
> (Or "/etc/system.locale" ?)
>
> If the symlink doesn't exist, would "C.UTF-8" be a suita= ble default
> moving forwards?=C2=A0 It certainly would not be backwards compatible,= since
> an existing fstab could have non-UTF-8 strings in it, encoded with the=
> locale of the user who touched the file.=C2=A0 But there's really = no
> backwards compatible solution, given that there's no guarantee tha= t (for
> any specific FreeBSD system) all system files were written with the sa= me
> locale.=C2=A0 Fun, eh? ;^)
>
> Opinions, thoughts, please?
>
> Thanks,
>=C2=A0 =C2=A0Phil
>
>

--
Rod Grimes=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0rgrimes@freebsd.org

--0000000000002f5cc205daf88b12--