Unicode-based FreeBSD

Svavar Lúthersson svavar at kjarrval.is
Mon Aug 25 23:20:55 UTC 2008

Alexander Churanov wrote:
> Svavar,
> You have to type "special characters" that are high-bit characters of
> ISO-8859-1 and -15. I have to type cyrillic characters that are high-bit
> characters of koi8-r. But I am able to do this. Did you try "keymap" and
> "scrnmap" settings of "rc.conf"? I am not sure, but your issue looks like
> misconfiguration.
> Then, about UTFs. All three forms encode THE SAME set of code points and
> from user's perspective there is no great difference. However, UTF-8 is
> interoperable with ASCII and this fact makes many old applications work
> without modification. I've already posted information about my experience of
> using vipw with UTF-8 on FreeBSD 6.2 having LANG=ru_RU.KOI8-R to the list.
> The actual drawback of my solution is that a person will not be able to read
> and type Icelandic and Russian text simultaneously in syscons console. And
> that ideas of obscuring output are attempts to provide some way to
> manipulate files with, say, russian names on a PC tuned for Icelandic text.
> Please note, that I DO NOT propagandize syscons character mode as a device
> for working correctly with multilingual texts. For some scripts, for
> example, Devanagari, syscons will NEVER work uless it is extended to
> something like X, freetype, freebidi and many other tools working together.
> Please, note that you can start working in true multilingual environment
> right now, using, for example, X+KDE (kate and konsole) and switching them
> to UTF-8. This will work.
> What I am trying to discuss is just making syscons working correctly if the
> whole system is switched to UTF-8. This will not affect X and KDE, but
> standard syscons FreeBSD console will fail to work correctly. Mainly the
> ideas are:
> 1) Make switching everything to UTF-8 possible.
> 2) Either map non-ASCII characters to 128-chars subset of full unicode range
>     Or encode them to sequences of ASCII chars.
>     Or mix these approaches.
> To my mind this should result in the following abilities:
> 1) To work in graphical environment without restrictions. (this is what you
> have right now)
> 2) To read and type some filenames (that contains only characters that are
> mappable to 8-bit font) in a natural way. (this is also possible now, but
> with 8-bit LANG, not UTF-8)
> 3) To read and type filenames that contain characters that do not fit in
> current 8-bit screenmap, possibly in an unnatural way.
> The later would help if you are in Iceland and see a Chinese filename. I
> want engineers that do techical support of systems to be able to delete or
> rename such files even in single-user mode. I think that typing something
> like "#1234;#4321;" instead of actual hieroglyph is affordable price.
> I'm just trying to be realistic and provide doable solution. I leave plans
> of rewriting every bit of software to others. And I even think that latter
> is not required, since syscons console is probably not heavily used now.
> Alexander Churanov
> _______________________________________________
> freebsd-current at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe at freebsd.org"

Again, I am not an expert in Unicode and I am not even suggesting that I 
know everything about FreeBSD. It is unfortunate that Alexander 
misunderstood me in some instances.

First, I mentioned that the Icelandic characters cannot be typed by 
default in the console. This also counts for the times when I configure 
the installer to use the Icelandic keyboard and it adds 
keymap="icelandic.iso.acc" into my rc.conf. The Icelandic alphabet works 
in editors like pico but I have not found another editor where it 
actually displays the characters correctly. I checked edit, vi and vipw 
to be sure. It might be a configuration problem or a lack of it but it's 
better for the user experience if it works out-of-the-box. It should be 
enough to configure it in one place and it should work "everywhere".

The primary problem of the character support in syscons is displaying 
specialised characters on the screen/tty. When I use the special 
Icelandic characters in UTF-8, each character is displayed as "??" which 
is very confusing to see if there are 2 or more in a row. One step in 
fixing that would be to enable syscons to display the correct symbols. 
If I press tab, it shows the symbol code for the characters. The problem 
is worse when the filename begins with a character I cannot write in the 
console and I think it even becomes even worse when displaying Cyrillic 
characters. How can I know if a hieroglyph corresponds to a specific 
character code? It could be offered as an alternative method of writing 
filenames. There is no easy solution to the "tech support problem", 
though. The drawback of your solutions is too great and I do not think 
it should be carried out in the way you suggest it.

Of course there are certain problems with changing the filenames between 
languages like Russian and Icelandic since the normal keyboard only has 
about 100 keys and cannot possible contain all the characters in the 
Unicode specification. It however should not stop me from reading the 
filenames in the language they were written. As for writing characters 
in other languages, the "Windows approach" steps in the right direction 
by enabling me to change the input language and therefore type in 
characters I would not otherwise be able to with the Icelandic keyboard. 
If the characters are translated to Unicode, it should not matter what 
keyboard layout is used. As for how it would be carried out in FreeBSD, 
I will leave it up to the developers.

The aforementioned is why I am suggesting that the system should be 
moved directly to UTF-32. If it is moved to UTF-8 and there is a need in 
the future for UTF-16 or -32, the conversion process has to start again. 
Like I mentioned in my former answer, the program writers do not write 
Unicode compatible programs because there is almost no Unicode support 
and the FreeBSD developers see little reason to speed up Unicode 
implementation because there are so few programs Unicode compatible. 
Therefore I think that FreeBSD should implement a Unicode support policy 
and move straight to UTF-32 and make it the FreeBSD default. I am not 
pretending that this project will be easy, painless and quick but it is 
better done sooner than later. Said policy could begin by announcing an 
active plan for Unicode support and suggest that every new FreeBSD 
project should support Unicode. At the same time it should suggest the 
same to other developers which write software for FreeBSD. When the time 
is right or after further steps, the FreeBSD Foundation should announce 
that after version X, Unicode will be default charset. At that time, the 
software which has Unicode support will (I hope) work flawlessly with 
Unicode characters. When UTF-32 would be fully supported in FreeBSD, the 
developers could wait for the end of the support cycle for the first 
version with full UTF-32 support and then make it the default in the 
versions to come. That way the backward compatibility would be great and 
for all supported versions of FreeBSD.

Með kveðju / With regards,
Svavar Kjarrval (svavar at kjarrval.is)
s. 863-9900

More information about the freebsd-current mailing list