[Bug 289370] wcsxfrm() fails with EINVAL for some characters

From: <bugzilla-noreply_at_freebsd.org>
Date: Mon, 08 Sep 2025 20:32:53 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=289370

--- Comment #6 from Mark Millard <marklmi26-fbsd@yahoo.com> ---
(In reply to Serhiy Storchaka from comment #3)

UTF-8 has (:

Code point to/from UTF-8 conversion
First code point        Last code point Byte 1          Byte 2          Byte 3 
        Byte 4

U+0000                  U+007F          0yyyzzzz        

U+0080                  U+07FF          110xxxyy        10yyzzzz        

U+0800                  U+FFFF          1110wwww        10xxxxyy       
10yyzzzz        

U+010000                U+10FFFF        11110uvv        10vvwwww       
10xxxxyy        10yyzzzz

L'\u00C5' ( a.k.a. U+00C5 )is in the range:
U+0080  U+07FF

That range uses 2 bytes for the UTF-8 encoding, not one:
110xxxyy        10yyzzzz


As far as I can tell: U+00C5 is not an example of:
single-byte LC_CTYPE

It looks like the BUGS section that I referenced does apply.

-- 
You are receiving this mail because:
You are the assignee for the bug.