bin/86450: tr translates wrong in german environment
Oliver Fromme
olli at lurza.secnetix.de
Thu Sep 22 02:10:08 PDT 2005
The following reply was made to PR bin/86450; it has been noted by GNATS.
From: Oliver Fromme <olli at lurza.secnetix.de>
To: bug-followup at FreeBSD.org, andy.321 at web.de
Cc:
Subject: Re: bin/86450: tr translates wrong in german environment
Date: Thu, 22 Sep 2005 11:09:04 +0200 (CEST)
Andreas <andy.321 at web.de> wrote:
> > Synopsis: tr translates wrong in german environment
It doesn't. As far as I can tell, the PR can be closed,
because it's a feature, not a bug. :-)
> While playing with tr I fund this:
>
> > setenv LANG de_DE.ISO8859-15
> > echo "ABCDEFGHIJKLMNOPQRSTUVWXYZ" | tr A-Z a-z
> > abcdefghijklmnopqrsßtüvwxÿ
>
> same in LANG=de_AT.ISO8859-15 or de_CH.ISO8859-15 (all de_*), maybe in other lan
> guages, but not in LANG=C or da_DK.ISO8859-15 (I do not try other languages)
That's correct and expected behaviour (POSIX / SUS).
The reason is that expressions like "a-z" depend on the
locale, particularly LC_COLLATE which controls alphabetic
ordering. In the German-language locales, the collation
order specifies the German symbol "ß" (ß) right
after "s", but there is no such symbol in the uppercase
equivalent, so the collation sequences have different
length. That's why you get garbage after that point.
In general it is a bad idea to use expressions like "A-Z"
or "a-z" with tr. You might get correct results in one
locale, but garbage in others. The following will work
and produce the expected result:
$ echo "ABCDEFGHIJKLMNOPQRSTUVWXYZ" | tr "[:upper:]" "[:lower:]"
abcdefghijklmnopqrstuvwxyz
Another way to perform lowercase conversion is to use awk:
$ echo "ABCDEFGHIJKLMNOPQRSTUVWXYZ" | awk '{print tolower($0)}'
abcdefghijklmnopqrstuvwxyz
Unfortunately there are (third-party) scripts which use
tr in the wrong way. Therefore my recommendation is to
not set LANG or LC_ALL, but instead only set LC_CTYPE
(for ISO8859 character support), and maybe LC_MESSAGES,
LC_NUMERIC and LC_TIME if desired (although these might
have bad side effects, too). If LC_COLLATE is required
for certain applications, then set it only for those
appliactions, but not in the global environment.
YMMV, of course.
Best regards
Oliver
--
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing
Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
Any opinions expressed in this message may be personal to the author
and may not necessarily reflect the opinions of secnetix in any way.
"[...] one observation we can make here is that Python makes
an excellent pseudocoding language, with the wonderful attribute
that it can actually be executed." -- Bruce Eckel
More information about the freebsd-bugs
mailing list