svn commit: r184691 - head/sys/compat/linprocfs

Wed Nov 5 15:26:44 PST 2008

2008/11/5 M. Warner Losh <imp at bsdimp.com>:
> In message: <200811051508.mA5F89XD030040 at svn.freebsd.org>
>            Dag-Erling Smorgrav <des at FreeBSD.org> writes:
> :   utf-8
>
> Is there some reason to prefer utf-8 over the 8-bit iso character set
> we were using?

Reason? You mean you actually *like* 8-bit code pages in the first place? :)

As a person from a country that has during its history decided it
really needs 3-4 dots and dashes in its alphabet that make it (the
alphabet) not representable in ASCII, and who has had Many Fun Days
converting between various 8-bit code pages, ISO standard or not, and
especially with deducing which code page is actually being used as all
bytes are created equal (and Microsoft just *had* to tweak two letters
from iso8859-2 into Latin2), I welcome UTF-8 with a warm room, a beer,
peanuts and a backrub.

UTF-8 (as opposed to old 8-bit code pages which need to die as soon as
possible and UTF-16 which got itself messed up with endianess) in
unambiguous. A sequence of proper UTF-8 bytes (and UTF-8 has a
structure so not every random collection of bytes with the 8th bit set
is proper UTF-8) can always be linked to the same letter.

This is why there's such a big push to get systems to properly support
UTF-8. FreeBSD had a SoC project this year that was supposed to
properly implement Unicode collations (and thus collation of UTF-8
strings) but it looks dead or in a dormant state right now (though I
didn't follow it attentively).