"Unprintable" 8-bit characters
freebsd at edvax.de
Wed Nov 9 02:10:27 UTC 2011
On Wed, 09 Nov 2011 02:51:31 +0100, Michael Ross wrote:
> Am 09.11.2011, 01:42 Uhr, schrieb Conrad J. Sabatier <conrads at cox.net>:
> > Pardon me if this may seem like a stupid question, but this is
> > something that's been bugging me for a long time, and none of my
> > research has turned up anything useful yet.
> > I've been trying to understand what the deal is with regards to the
> > displaying of the "extended" 8-bit character set, i.e., 8-bit characters
> > with the MSB set.
> > More specifically, I'm trying to figure out how to get the "ls" command
> > to properly display filenames containing characters in this extended
> > set. I have some MP3 files, for instance, whose names contain certain
> > European characters, such as the lowercase "u" with umlaut (code 0xfc
> > in the Latin set, according to gucharmap), that I just can't get ls to
> > display properly. These characters seem to be considered by ls as
> > "unprintable", and the best I've been able to produce in the ls
> > output is backslash interpretations of the characters using either the
> > -B or -b options, otherwise the default "?" is displayed in their place.
> Unsure if I understand you correctly.
> ("extended" 8-bit character set with MSB? utf-16?)
> I'm confused by this charset stuff in general.
> Assuming you want \0xfc displayed as "ü",
> > cat test.py && python test.py && ls -l
> # -*- coding: utf-8 -*-
> total 2
> -rw-r--r-- 1 michael wheel 29 9 Nov 02:43 test.py
> -rw-r--r-- 1 michael wheel 0 9 Nov 02:44 ü
> here is what works for me:
> in my login class in /etc/login.conf:
> ``cap_mkdb /etc/login.conf'' after changes
Ah, thanks - that seems to be the proper way to have
the environmental variables set - instead of my (ab)use
of setenv's in the csh config file. :-)
Note the "precedence" of $LANG vs. $LC_* (as they can
be used to configure things more precisely, e. g.
regarding system messages or date formats; see example
> in /etc/rc.conf:
Hm? CP437? Codepage? Isn't that some MS-DOS thing?
I've never needed a screenmap to make "extended
characters" (everything beyong US-ASCII) work.
> and in /etc/ttys, console type is set to ``cons25l1''
I have a similar setting here, but that does _not_ work
wuth UTF-8 codec characters. If I want to use them, I
have to change some environmental variables, from
#-------GERMAN/ENGLISH------------------------ <=== DEFAULT
setenv LC_ALL en_US.ISO8859-1
setenv LC_MESSAGES en_US.ISO8859-1
setenv LC_COLLATE de_DE.ISO8859-1
setenv LC_CTYPE de_DE.ISO8859-1
setenv LC_MONETARY de_DE.ISO8859-1
setenv LC_NUMERIC de_DE.ISO8859-1
setenv LC_TIME de_DE.ISO8859-1
setenv LC_ALL en_US.UTF-8
setenv LC_MESSAGES en_US.UTF-8
setenv LC_COLLATE de_DE.UTF-8
setenv LC_CTYPE de_DE.UTF-8
setenv LC_MONETARY de_DE.UTF-8
setenv LC_NUMERIC de_DE.UTF-8
setenv LC_TIME de_DE.UTF-8
setenv LANG de_DE.UTF-8
Then I can use UTF-8 characters inside rxvt-unicode. Of
course, text mode console is limited to the first set
of configuration, using the ISO 8859-1 character set.
This worked long before UTF-8 arrived with the glorious
idea that I should have 2 bytes where one is sufficient,
to describe our (german) 6 umlauts and the Eszett ligature. :-)
Improper settings will result in  or A-tilde three
quarters upside-down question mark, depending on editor
or terminal used.
But returning to the original question, I think Robert
did explain it very well: There is no real consensus
about what the different codings should mean. They
were meant to unify the representation of a very large
set of characters, but basically there are many inter-
pretations now, and how they show up to the user depends
on the font in use, _if_ it has this mapping or that,
For running ls, -w is the right option to use - but IN
COMBINATION with correct settings for the terminal
emulation AND the presence of a font that will do.
Again a fine demonstration why file names should be
limited to printable ASCII and no spaces if you want
them to work everywhere. :-)
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
More information about the freebsd-questions