[Bug 232374] /bin/sh can not handle ja_JP.eucJP character code

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Thu Nov 8 13:08:44 UTC 2018


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=232374

Yuichiro NAITO <naito.yuichiro at gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |naito.yuichiro at gmail.com

--- Comment #2 from Yuichiro NAITO <naito.yuichiro at gmail.com> ---
In my investigation, main reason of this problem is because read_char()
function
doesn't retry read(2) from STDIN when mbrtowc(3) returns -2.
In lib/libedit/read.c, we can see following code that retries only when
CHARSET_IS_UTF8 flag is set.

```
                switch (ct_mbrtowc(cp, cbuf, cbp)) {
<snip>
                case (size_t)-2:
                       /*
                        * We don't support other multibyte charsets.
                        * The second condition shouldn't happen
                        * and is here merely for additional safety.
                        */
                       if ((el->el_flags & CHARSET_IS_UTF8) == 0 ||
                           cbp >= MB_LEN_MAX) {
                               errno = EILSEQ;
                               *cp = L'\0';
                               return -1;
                       }
                        /* Incomplete sequence, read another byte. */
                        goto again;
```

Of course, CHARSET_IS_UTF8 flag is not set in eucJP environment.
Try cutting CHARSET_IS_UTF8 flag check, /bin/sh works to read eucJP code.

And I found another problem with cutting CHARSET_IS_UTF8 flag check.
It is that command history mistakes calculating eucJP character length,
because ct_enc_width() function in chartype.c doesn't understand other charset
than UTF-8.

I rewrite ct_enc_width() to use wctomb(3), command history problem is fixed.

With these two changes, we don't need CHARSET_IS_UTF8 flag any more.
CHARSET_IS_UTF8 flag controls NARROW_HISTORY flag, and NARROW_HISTORY flag
is used only in HIST_FUN definition.

```
#ifdef WIDECHAR
#define HIST_FUN(el, fn, arg) \
    (((el)->el_flags & NARROW_HISTORY) ? hist_convert(el, fn, arg) : \
        HIST_FUN_INTERNAL(el, fn, arg))
#else
#define HIST_FUN(el, fn, arg) HIST_FUN_INTERNAL(el, fn, arg)
#endif
```

In WIDECHAR environment, hist_convert() should be called always,
because hist_convert() is a multibyte aware function.

For all my fix, I opened new differential on Phabricator.

  https://reviews.freebsd.org/D17903

I believe my fix solve this problem and doesn't affect other charset than
eucJP.
Please review my code.

Hirabayashi-san:
 Could you please try my patch from Phabricator and check if this problem is
fixed?
 I don't think /bin/sh is wrong.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list