Unicode-based FreeBSD

Mon Aug 25 02:59:00 UTC 2008

2008/8/24 Tz-Huan Huang <tzhuan at csie.org>

> I'm a Chinese living in Taiwan and I am probably sure that Unicode is
> larger
> than any other Chinese character sets (including traditional and simplified
> Chinese). The UTF-8 support in FreeBSD/Xorg is good enough for me.
> I can read/type all Unicode 4.0 characters (including CJKV extension A/B)
> in Firefox or any gtk/qt programs if I have the needed font; I can produce
> documents with any Unicode characters by LaTeX+CJK package.
> It's much better than MS IE and Word because IE and Word only support
> Unicode 2.0 (or maybe 3.0, I'm not so sure).
>
> There are two reasons to use any character sets other than UTF-8:
> 1. compatibility for old programs/services or other OS.
> 2. the old man wrote the document when Unicode was not so popular and
> newbies read the old document.
>
> UTF-8 is more and more popular in Chinese, at least in Taiwan.
> Almost everything works well in my daily jobs (of course under the X).
> The major missing part is the kiconv UTF-8 support -- currently the kiconv
> doesn't support more than two bytes character conversion so there
> is no UTF-8 support for Chinese (most Chinese characters are 3-byte or
> more). I should mount msdosfs/cd9660 in zh_TW.Big5 and convert the
> filename to UTF-8 by lint or screen.
>
> IMHO, If I need Chinese support, I'll go into X. I have no reason to use
> Chinese under console even if I can read/type in Chinese. I prefer Firefox
> rather than w3m or links. :-)
>
> Regards,
> Tz-Huan
> _______________________________________________
> freebsd-current at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe at freebsd.org"

Tz-Huan,

Working with Chinese text is the hard part of my solution (described in full
in freebsd-current at freebsd.org). In brief it's about moving FreeBSD to UTF-8
completely and making syscons map UTF-8 to selected 8-bit charset for
displaying (a failsafe solution). It seems that this makes syscons somewhat
more usable for some people, but not for from East Asia, am I right?

I was thinking of how to make working with Chinese filenames possible under
syscons, but the help of a native speaker/writer would help much, because I
know only basic facts about that matter.

I see two alternatives of displaying unicode code points that do not fit
into selected 8-bit display charset:

1) Substituting with some character, like '?'. This is very affordable
solutiuon, but makes inconvenient working with files having names that do
not fit into selected charset.

2) Substituting with encoded code point value like "#1234;". This is more
complex solutuon, if correct baskspacing and things like that are required.
I am not ready to implement it.

In any case, it would be  nice to have some "magic" implemented: if copying
a text with substitued code points and then pasting it would case the
original UTF-8 sequence to be inserted.

For all folks I'd like to explain again that I'm not discussing correct
rendering of non-latin scripts. It's not possible to render Devanagari in
character mode. And approach that Linux console takes is partial. The cost
of full solution is like X, freetype, freebidi and so on.

Tz-Huan, could you comment on the proposed solution? From your point of
view, are proposed changes in syscons useful?

Again, this does not affect X, Firefox, etc, but would make possible to have
the whole system using UTF-8 out of box.

Alexander Churanov