Question about ASCII and nl_langinfo (locale work)

Mon Nov 16 19:00:39 UTC 2015

On 16.11.2015 20:35, Ed Schouten wrote:
> I personally think it's a shame if we were to deviate from returning
> "US-ASCII", for the reason that "US-ASCII" also happens to be the
> preferred MIME name for the character set:
> 
> http://www.iana.org/assignments/character-sets/character-sets.xhtml
> 
> "ASCII" doesn't even seem to be an alias for this character set.

Yes, I overlook it somehow. ASCII is not in the IANA, while both
ANSI_X3.4-1968 and US-ASCII are.

So, I reconsider the proposal. We can return ANSI_X3.4-1968 for POSIX/C
(for Linux compatibility reasons) and left pure US-ASCII as it was
(since it is used rarely).

> In my opinion a decent implementation of newlocale() should support
> any of the character set names and aliases provided on the IANA page,
> but let nl_langinfo(CODESET) return the preferred MIME name.

BTW, we already have and return non-IANA codesets historically (inspired
by X11). I.e. we have ISO8859-* instead of preferred names ISO-8859-*,
moreover, ISO8859-* even not the aliases (!) and IANA knows nothing
about them. Linux have IANA preferred names here, i.e. ISO-8859-*.

So the question is: should we rename ISO8859-* to ISO-8859-* to be IANA
and Linux compatible?

We can strip first (or all) "_" and "-" from the environment names (as
Linux does), to not violate POLA.

>> That means we need to teach all upstream about US-ASCII all the time.
> 
> Could you come up with a concrete list of pieces of software that need
> to be changed? Is it just those three pieces of software that you
> mentioned above? If so, then I think it would be a shame to make the
> concession.

No, I see such checks many times in other programs too, tcl is just one
which can be found quickly. The proper procedure to examine situation
will be to unpack _all_ ports and search through the code, but my
machine can't handle it.

-- 
http://ache.vniz.net/