[Bug 289370] wcsxfrm() fails with EINVAL for some characters

From: <bugzilla-noreply_at_freebsd.org>
Date: Mon, 08 Sep 2025 09:01:26 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=289370

            Bug ID: 289370
           Summary: wcsxfrm() fails with EINVAL for some characters
           Product: Base System
           Version: 14.2-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Many People
          Priority: ---
         Component: standards
          Assignee: standards@FreeBSD.org
          Reporter: storchaka@gmail.com
 Attachment #263597 text/plain
         mime type:

Created attachment 263597
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=263597&action=edit
Reproducer in C

The C function wcsxfrm() fails with errno=EINVAL for some characters on
non-Posix locales. See a reproducer in the attachment.

For example, for en_US.UTF-8 it is two characters, 'Å' (U+00C5 LATIN CAPITAL
LETTER A WITH RING ABOVE) and 'Å' (U+212B ANGSTROM SIGN).

```
$ LC_ALL=en_US.UTF-8 ./wcsxfrm-test
U+00C5
U+212B
```

For ar_EG.UTF-8, el_GR.UTF-8, ja_JP.UTF-8 and ro_RO.UTF-8 the list is much
longer. You can get all bad characters on all locales by running:

```
$ for loc in `locale -a`; do LC_ALL=$loc ./wcsxfrm-test || echo === $loc; done
```

This issue was discovered on the CPython bug tracker:
https://github.com/python/cpython/issues/130567#issuecomment-3262769733.

-- 
You are receiving this mail because:
You are the assignee for the bug.