converting strings from utf8
Tim Kientzle
kientzle at freebsd.org
Wed Nov 5 13:37:15 PST 2008
Maksim Yevmenkin wrote:
>
> can i use wcstombs(3) to convert a string presented in utf8 into
> current locale? basically i'm looking for something like iconv from
> ports but included into base system.
This isn't as easy as it should be, unfortunately.
First, UTF-8 is itself a multibyte encoding, so you have
to first convert to wide characters before you can use
wcstombs(). You could in theory use the following:
* Set locale to UTF-8
* use mbstowcs() to convert UTF-8 into wide characters
* Set locale to your preferred locale
* use wcstombs() to convert wide characters to your locale
Besides being ugly, the locale names themselves are not
standardized, so it's hard to do this portably. For a
lot of applications, the error handling in wcstombs() is
also troublesome; it rejects the entire string if any one
character can't be converted.
When I had to do this for libarchive, where the code had
to be very portable (which precluded using iconv), I ended
up doing the following:
* Wrote my own converter from UTF-8 to wide characters
(fortunately, UTF-8 is pretty simple to decode; this
is about 20-30 lines of C)
* Used wctomb() to convert one character at a time from
wide characters to the current locale.
I've found that wctomb() is more portable than a lot of
the other functions (I think it's in C89, whereas a lot
of the other standard conversion routines were introduced
in C99) and provides better error-handling capabilities
since it operates on one character at a time (so you
can, for instance, convert characters that aren't
supported in the current locale into '?' or some kind
of \-escape).
Feel free to copy any of my code from libarchive if it helps.
Tim
More information about the freebsd-hackers
mailing list