SoC 2009: BSD-licensed libiconv in base system
Gabor Kovesdan
gabor at FreeBSD.org
Tue Apr 28 09:25:38 UTC 2009
David Schultz escribió:
> On Mon, Apr 27, 2009, Joerg Sonnenberger wrote:
>
>> On Mon, Apr 27, 2009 at 11:49:41AM -0700, Tim Kientzle wrote:
>>
>>> David Schultz wrote:
>>>
>>>> ... whether it would make more sense to standardize on something like
>>>> UCS-4 for the internal representation.
>>>>
>>> YES. Without this, wchar_t is useless.
>>>
>> I strongly disagree. Everything can be represented as UCS-4 is a bad
>> assumption, but something Americans and Europeans naturally don't have
>> to care about.
>>
>
> ...but isn't this moot at present because there are no
> widely-accepted encodings that include characters that
> aren't supported by UCS-4? Citrus doesn't seem to support
> any such encodings in any case.
>
Citrus is based on UCS-4 as an internal encoding, just like the another
BSD-licensed iconv library. This is a barrier to support encodings that
aren't supported by UCS-4.
> If this ever really becomes an issue, we could always stuff
> locale-dependent encodings into unused UCS-4 code pages.
> However, it doesn't seem worthwhile to deliberately burden
> programmers over concerns that are presently, and for the
> foreseeable future, hypothetical.
>
I'm not a Unicode expert, but isn't the reason of periodical standard
reviews and changes to cover more and more human languages? We could
just support the latest Unicode standard and let the Unicode workgroups
map those new characters into unused code points. The Latin-based,
Cyrillic, Devanagari and CJK encodings are well-supported, I think. I
don't know too much about CJK encondings, though, if the thousands of
ideographs are all supported or not. But I'd say the most significant
languages that are used on the Internet are supported, the rest might
have another problems...
[OFF]
It's possible that there are little poor countries with an own writing
system but probably their writing system is unsupported because the
starvation, poorness and lack of water and electricity are more serious
problems there. My ex-girlfriend is working in Nepal in a cooperation
program (it's kinda scholarship) and she told me that they only have
electricity in 8 hours a day, 4 during the night and 4 during the day.
There are no sidewalks for pedestrians, they go along with the cars on
the street and the pollution is extremely high. Even this country's
encoding is supported. What I am trying to say is that countries with
unsupported languages probably won't really care about character
encodings if they rarely have computers... I can just hope that their
living conditions will get better and their language will be supported.
I can also hope that the Unicode people will focus more on these
countries instead of fucking up the time with fictionary languages from
fairy tales... [1]
Probably I'll go to visit her in Nepal in January, it will be an
interesting experience. I'll check if I can help the IT world there with
anything.
[ON]
Another idea to consider. Are all of our utilities wchar-clean? What
about library functions? (regex is surely not) Do we lack any important
utility or library? (we still do lack iconv and gettext and what
else...?) What about standards, like C99 wchar functions? Is there
something missing? What about POSIX if it has something related?
Personally, I think that these are more important questions than support
of some extremely rare languages. It's worth to consider how to deal
with them later but the basic problems need a higher priority.
[1] http://en.wikipedia.org/wiki/Tengwar#Unicode
Cheers,
--
Gabor Kovesdan
FreeBSD Volunteer
EMAIL: gabor at FreeBSD.org .:|:. gabor at kovesdan.org
WEB: http://people.FreeBSD.org/~gabor .:|:. http://kovesdan.org
More information about the freebsd-hackers
mailing list