gb18030(5) manual page for review

Tim Robbins tjr at FreeBSD.ORG
Wed Aug 6 22:55:43 PDT 2003


I noticed that support for the GB18030 encoding was recently committed. I had
already implemented it in a Perforce branch, along with the rest of my planned
overhaul of the character encoding functions in libc for FreeBSD 6. The only
thing that my implementation has that Robin Hu's doesn't is a manual page :-)

I've attached my manual page, which I plan to commit in the next week or so.
I'd appreciate comments from Chinese speakers or anyone who's generally
clueful when it comes to character encodings.

BTW, just to save duplication of effort in the future: I've already
implemented the ISO-2022-CN and ISO-2022-JP encodings and all the related
state-dependent encoding support, and will probably be committing it when
6.0-current is created.


Thanks,

Tim


.\" [copyright header trimmed for mail]
.\"
.\" $FreeBSD$
.Dd March 30, 2003
.Dt GB18030 5
.Os
.Sh NAME
.Nm gb18030
.Nd "GB 18030 encoding method for Chinese text"
.Sh SYNOPSIS
.Nm ENCODING
.Qq GB18030
.Sh DESCRIPTION
The
.Nm GB18030
encoding implements GB 18030-2000, a PRC National Standard for the encoding of
Chinese characters.
It is a superset of the older GB 2312-80 and GBK encodings.
.Pp
Multibyte characters in the GB18030 encoding can be one byte, two bytes, or
four bytes long.
There is a total of over 1.5 million code positions.
.Pp
The
.Tn ASCII
character set is represented by a single byte in the range 0x00 to 0x7F.
.Pp
Chinese characters are represented as either two bytes or four bytes.
Characters which are represented by two bytes begin with a byte in the range
0x81-0xFE and end with a byte either in the range 0x40-0x7E or 0x80-0xFE.
.Pp
Characters which are represented by four bytes begin with a byte in the range
0x81-0xFE, have a second byte in the range 0x30-0x39, a third byte in the range
0x81-0xFE and a fourth byte in the range 0x30-0x39.
.Sh SEE ALSO
.Xr euc 4 ,
.Xr utf8 5
.Rs
.%T "PRC National Standard GB 18030-2000"
.%D "March 2000"
.Re
.Sh STANDARDS
The
.Nm
encoding is believed to be compatible with GB 18030-2000.


More information about the freebsd-i18n mailing list