[Bug 200398] iconv(3) support of UTF-7 is broken

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Fri May 22 21:52:41 UTC 2015


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200398

            Bug ID: 200398
           Summary: iconv(3) support of UTF-7 is broken
           Product: Base System
           Version: 10.1-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: bin
          Assignee: freebsd-bugs at FreeBSD.org
          Reporter: delphij at FreeBSD.org

Created attachment 157059
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=157059&action=edit
Patch by tijl@

(This is mainly for tracking purposes).

I have observed this issue with dovecot which the index worker would crash for:

Panic: file charset-iconv.c: line 132 (charset_to_utf8): assertion
failed: (*src_size - pos <= CHARSET_MAX_PENDING_BUF_SIZE)

Having been annoyed by this for some time I have decided to instrument the code
to figure out what have happen under the hood.  Eventually, I have discovered
that if iconv(3) is asked to convert two UTF-7 strings to UTF-8:

"+ADw-SPAN+AD4-"

And

"+ADw-SPAN lang"

The second conversion would give wrong results, while GNU implementation of
iconv(3) does not have the same issue.  Using gdb, the UTF-7 mode was 1 (shift)
when processing the second string, while it should be 0, so using an iconv(cd,
NULL, NULL, NULL) would mitigate this issue.

I have then asked Tijl Coosemans <tijl@> who have quickly found the problem and
created a patch, quote:

===
_citrus_UTF7_mbtoutf16 stored the decoder state at the beginning so it
could restore this state on an incomplete character such that the next
call would restart the decoding.  The problem was that "-" at the end
of a string was also treated as an incomplete character but was also
removed from the state buffer.  So the initial state would be restored
(with base64 mode) but the next call would no longer see the "-" and
thus continued in base64 mode.

This state saving/restoring isn't needed here.  It's already handled
elsewhere (citrus_iconv_std.c:_citrus_iconv_std_iconv_convert) so the
patch removes it.

The patch also improves the decoding of 4 byte UTF16 characters.  If
only 2 bytes can be read it is treated as an incomplete character now
(returning an error) whereas before it would be treated as a shift
sequence (not an error).  A range check has been added for the low 2
bytes as well.
===

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list