[Bug 268189] BSD tar incorectly encode UTF-8 sequences
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 268189] BSD tar incorectly encode UTF-8 sequences"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 268189] BSD tar incorectly encode UTF-8 sequences"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 268189] BSD tar incorectly encode UTF-8 sequences"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 06 Dec 2022 08:22:58 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=268189
Bug ID: 268189
Summary: BSD tar incorectly encode UTF-8 sequences
Product: Base System
Version: 13.1-RELEASE
Hardware: Any
OS: Any
Status: New
Severity: Affects Many People
Priority: ---
Component: bin
Assignee: bugs@FreeBSD.org
Reporter: aeder@list.ru
BSD tar incorectly encode UTF-8 sequences
How to repeat:
Create two directories with (UTF-8) names:
d0 bf d0 be d0 bb d0 b5 d0 b2 d0 be d0 b8 cc 86
d0 bf d0 be d0 bb d0 b5 d0 b2 d0 be d0 b9
("полевой" and "полевой"). It looks exactly the same, but actually it's
different names.
The difference is that sequence 'd0 b9' encode cyrillic 'й' symbol, but 'd0 b8
cc 86' encode actually two symbols: cyrillic 'и' and diacritic symbol which I
can't enter here.
You can create such directories or files, but if archived using BSD tar, second
name become replaced by first name.
Adding --posix option or LC_ALL=C doesn't help.
GNU tar handle such files correctly - as separate files/directories.
I think at least --posix (or some another option) must allow to COMPLETELY
disable all filename encoding/decoding operations.
Problem arise in 12.3-RELEASE also, but seems to absent in 10-RELEASEs.
--
You are receiving this mail because:
You are the assignee for the bug.