svn commit: r265095 - head/lib/libc/locale
Jilles Tjoelker
jilles at stack.nl
Wed Apr 30 21:10:32 UTC 2014
On Tue, Apr 29, 2014 at 03:25:57PM +0000, Pedro F. Giffuni wrote:
> Author: pfg
> Date: Tue Apr 29 15:25:57 2014
> New Revision: 265095
> URL: http://svnweb.freebsd.org/changeset/base/265095
> Log:
> citrus: Avoid invalid code points.
>
> From the OpenBSD log:
> The UTF-8 decoder should not accept byte sequences which decode to unicode
> code positions U+D800 to U+DFFF (UTF-16 surrogates), U+FFFE, and U+FFFF.
> http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
> http://unicode.org/faq/utf_bom.html#utf8-4
> Reported by: Stefan Sperling
> Obtained from: OpenBSD
> MFC after: 5 days
> Modified:
> head/lib/libc/locale/utf8.c
> Modified: head/lib/libc/locale/utf8.c
> ==============================================================================
> --- head/lib/libc/locale/utf8.c Tue Apr 29 15:12:23 2014 (r265094)
> +++ head/lib/libc/locale/utf8.c Tue Apr 29 15:25:57 2014 (r265095)
> @@ -203,6 +203,14 @@ _UTF8_mbrtowc(wchar_t * __restrict pwc,
> errno = EILSEQ;
> return ((size_t)-1);
> }
> + if ((wch >= 0xd800 && wch <= 0xdfff) ||
> + wch == 0xfffe || wch == 0xffff) {
> + /*
> + * Malformed input; invalid code points.
> + */
> + errno = EILSEQ;
> + return ((size_t)-1);
> + }
> if (pwc != NULL)
> *pwc = wch;
> us->want = 0;
Hmm, I think U+FFFE and U+FFFF should be passed through normally.
According to http://www.unicode.org/faq/private_use.html they are
"noncharacters" (basically a more private variant of private-use
characters) and must be mapped through UTFs.
The part that rejects U+D800 to U+DFFF is definitely correct, though.
http://unicode.org/faq/utf_bom.html#utf8-4 tells to do only that.
The part about U+FFFE and U+FFFF in
http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 seems out of date.
Note the last modified date of that page: 2009-05-11.
On another note, everything above U+0010FFFF should perhaps be rejected
since those codes, which cannot be encoded in UTF-16, were excluded from
Unicode and ISO 10646.
--
Jilles Tjoelker
More information about the svn-src-all
mailing list