Unicode-based FreeBSD

Svavar Lúthersson svavar at kjarrval.is
Tue Aug 26 00:55:04 UTC 2008

Alexander Churanov wrote:
> Svavar,
> I am trying to understand you.
> 2008/8/26 Svavar Lúthersson <svavar at kjarrval.is 
> <mailto:svavar at kjarrval.is>>
>     The Icelandic alphabet works in editors like pico but I have not
>     found another editor where it actually displays the characters
>     correctly. I checked edit, vi and vipw to be sure. It might be a
>     configuration problem or a lack of it but it's better for the user
>     experience if it works out-of-the-box. It should be enough to
>     configure it in one place and it should work "everywhere".
> Hmm. A minute ago I've pressed Ctrl-Alt-F3, switched to syscons 
> console, started "emacs /tmp/test", where "test" was written in 
> russian, typed some russian text into, closed the editor and then 
> started "cat /tmp/test". No problems. I still can not understand 
> what's the difference between ISO-8859-1 and KOI8-R from the 
> implementation point of view. It seems that I need to try to configure 
> a system for Icelandic. I'll do that tomorrow on a dedicated box. I 
> promise to help you with configuration in case It's at all possible.
>     The primary problem of the character support in syscons is
>     displaying specialised characters on the screen/tty. When I use
>     the special Icelandic characters in UTF-8, each character is
>     displayed as "??" which is very confusing to see if there are 2 or
>     more in a row...
> This is exactly what I am trying to solve, examining opinions on this 
> list at the same time.
>     Of course there are certain problems with changing the filenames
>     between languages like Russian and Icelandic since the normal
>     keyboard only has about 100 keys and cannot possible contain all
>     the characters in the Unicode specification.
> There are special Input Methods for the rest of Unicode (more than 
> 200K code points currently assigned).
>     It however should not stop me from reading the filenames in the
>     language they were written. As for writing characters in other
>     languages, the "Windows approach" steps in the right direction by
>     enabling me to change the input language and therefore type in
>     characters I would not otherwise be able to with the Icelandic
>     keyboard. If the characters are translated to Unicode, it should
>     not matter what keyboard layout is used. As for how it would be
>     carried out in FreeBSD, I will leave it up to the developers.
> For switching I use CapsLock when in plain syscons console and 
>  Alt+Shift when in X. By the way, how Windows displays non-ASCII 
> characters in plain text console? I'll wonder if better than suggested 
> by me.
>     The aforementioned is why I am suggesting that the system should
>     be moved directly to UTF-32. If it is moved to UTF-8 and there is
>     a need in the future for UTF-16 or -32, the conversion process has
>     to start again.
> I'm sure that it is not necessary. Again, all UTFs encode THE SAME 
> SET. But UTF-32 is better for single characters. And UTF-8 is better 
> for UNIX-like systems.
>     Like I mentioned in my former answer, the program writers do not
>     write Unicode compatible programs because there is almost no
>     Unicode support and the FreeBSD developers see little reason to
>     speed up Unicode implementation because there are so few programs
>     Unicode compatible. Therefore I think that FreeBSD should
>     implement a Unicode support policy and move straight to UTF-32 and
>     make it the FreeBSD default. I am not pretending that this project
>     will be easy, painless and quick but it is better done sooner than
>     later. Said policy could begin by announcing an active plan for
>     Unicode support and suggest that every new FreeBSD project should
>     support Unicode. At the same time it should suggest the same to
>     other developers which write software for FreeBSD. When the time
>     is right or after further steps, the FreeBSD Foundation should
>     announce that after version X, Unicode will be default charset. At
>     that time, the software which has Unicode support will (I hope)
>     work flawlessly with Unicode characters. When UTF-32 would be
>     fully supported in FreeBSD, the developers could wait for the end
>     of the support cycle for the first version with full UTF-32
>     support and then make it the default in the versions to come. That
>     way the backward compatibility would be great and for all
>     supported versions of FreeBSD.
> Probably, this is useful, but I'm sure that this is out of scope of my 
> little project. I do not have enough power to enforce such a policy.
>     Með kveðju / With regards,
>     Svavar Kjarrval (svavar at kjarrval.is <mailto:svavar at kjarrval.is>)
>     s. 863-9900
> Alexander Churanov

Not everything was directed to you so please do not take things 
personally. :þ (The Icelandic "þ" at work.)

I am not against your idea of adding the Unicode support in general, 
just that it doesn't go far enough. I did not misunderstand your point 
that it uses the same set but we would end up in the same situation when 
it will be time to add display and writing support for UTF-16 and UTF-32 
although it would be slightly easier since we do not have to make an 
actual conversion of existing characters. Still we have to think about 
the programs that only "upgrade" to UTF-8 and have to go through yet 
another change to support UTF-16 or UTF-32. It is much easier in the 
long run to just go all the way to UTF-32 to begin with. Going to UTF-8 
might fix some of the character issues but we would be in the same shoes 
when it comes to characters which are in -16 and -32 but not in -8. I am 
not a user of X in FreeBSD (but soon, I hope) so my FreeBSD environment 
is limited to the console.

Windows (XP) displays the Icelandic alphabet with no problems at the 
command prompt. I also have no problems typing it. Somebody else has to 
try to see if other charsets, like Russian or Chinese, work there as well.

If the result of this discussion is that UTF-8 support is enough, I 
suggest the future expansion to UTF-16 and UTF-32 will be taken into 
account and make is as easy as possible for somebody in the future to 
carry it out. If my programming was any better, I would have volunteered 
for the job.

The policy suggestion was directed to the mailing list in general, not 
as a directive that you should or must enforce.

Með kveðju / With regards,
Svavar Kjarrval (svavar at kjarrval.is)
s. 863-9900

More information about the freebsd-current mailing list