Compose key and xterm vs. UTF-8

Sun Mar 14 00:16:49 UTC 2010

Short:
------
Why do compose key sequences fail to work in a UTF-8 xterm?

Long:
-----
I have configured a compose key (aka "Multi_key") here:

$ setxkbmap -layout us -option compose:ralt

When I start an xterm with an ISO8859-1 locale
$ LC_CTYPE=en_US.ISO8859-1 xterm &
I can use the compose key to enter the expected set of characters
with diacritics. (a e i o u with acute accent, grave accent, etc.)
So far, so good.

When I start an xterm with an ISO8859-2 locale
$ LC_CTYPE=pl_PL.ISO8859-2 xterm &
xterm will accept compose sequences for 8859-2 characters such as
r with acute, l with stoke, etc., but display a different character
as if 8859-2 codes are mistaken for 8859-1.  Hmm.  Buuut... I see
in the man page that I need to tell xterm to use the locale settings:
$ LC_CTYPE=pl_PL.ISO8859-2 xterm -lc &
This works as expected, I can enter 8859-2 characters with compose
sequences.  (And I can set the XTerm*locale resource to make this
the default behavior.)

Now, when I start an xterm with a UTF-8 locale
$ LC_CTYPE=en_US.UTF-8 xterm -lc &
xterm fails to accept compose sequences for common characters from
8859-1/2.  At first I thought none would work at all, but some
experimenting revealed a very few, such as --- for some sort of
dash and L- for a pound sign.  No common letters with diacritics,
though.  The language part of the locale setting doesn't matter;
en_US, de_DE, pl_PL, no difference.

This is disappointing.  You would expect a superset of the common
8859-x characters to be available with UTF-8.  Can anybody shed some
light on what's going on there?

-- 
Christian "naddy" Weisgerber                          naddy at mips.inka.de