bin/86450: tr translates wrong in german environment

Oliver Fromme olli at lurza.secnetix.de
Thu Sep 22 02:10:08 PDT 2005


The following reply was made to PR bin/86450; it has been noted by GNATS.

From: Oliver Fromme <olli at lurza.secnetix.de>
To: bug-followup at FreeBSD.org, andy.321 at web.de
Cc:  
Subject: Re: bin/86450: tr translates wrong in german environment
Date: Thu, 22 Sep 2005 11:09:04 +0200 (CEST)

 Andreas <andy.321 at web.de> wrote:
  > > Synopsis:       tr translates wrong in german environment
 
 It doesn't.  As far as I can tell, the PR can be closed,
 because it's a feature, not a bug.  :-)
 
  >       While playing with tr I fund this:
  > 
  > > setenv LANG de_DE.ISO8859-15
  > > echo "ABCDEFGHIJKLMNOPQRSTUVWXYZ" | tr A-Z a-z
  > > abcdefghijklmnopqrsßtüvwxÿ
  > 
  > same in LANG=de_AT.ISO8859-15 or de_CH.ISO8859-15 (all de_*), maybe in other lan
  > guages, but not in LANG=C or da_DK.ISO8859-15 (I do not try other languages)
 
 That's correct and expected behaviour (POSIX / SUS).
 
 The reason is that expressions like "a-z" depend on the
 locale, particularly LC_COLLATE which controls alphabetic
 ordering.  In the German-language locales, the collation
 order specifies the German symbol "ß" (&szlig;) right
 after "s", but there is no such symbol in the uppercase
 equivalent, so the collation sequences have different
 length.  That's why you get garbage after that point.
 
 In general it is a bad idea to use expressions like "A-Z"
 or "a-z" with tr.  You might get correct results in one
 locale, but garbage in others.  The following will work
 and produce the expected result:
 
 $ echo "ABCDEFGHIJKLMNOPQRSTUVWXYZ" | tr "[:upper:]" "[:lower:]"
 abcdefghijklmnopqrstuvwxyz
 
 Another way to perform lowercase conversion is to use awk:
 
 $ echo "ABCDEFGHIJKLMNOPQRSTUVWXYZ" | awk '{print tolower($0)}'
 abcdefghijklmnopqrstuvwxyz
 
 Unfortunately there are (third-party) scripts which use
 tr in the wrong way.  Therefore my recommendation is to
 not set LANG or LC_ALL, but instead only set LC_CTYPE
 (for ISO8859 character support), and maybe LC_MESSAGES,
 LC_NUMERIC and LC_TIME if desired (although these might
 have bad side effects, too).  If LC_COLLATE is required
 for certain applications, then set it only for those
 appliactions, but not in the global environment.
 
 YMMV, of course.
 
 Best regards
    Oliver
 
 -- 
 Oliver Fromme,  secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing
 Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
 Any opinions expressed in this message may be personal to the author
 and may not necessarily reflect the opinions of secnetix in any way.
 
 "[...]  one observation we can make here is that Python makes
 an excellent pseudocoding language, with the wonderful attribute
 that it can actually be executed."  --  Bruce Eckel


More information about the freebsd-bugs mailing list