[Bug 202290] /usr/bin/vi conversion error on valid character
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Thu Aug 13 19:57:48 UTC 2015
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=202290
--- Comment #1 from lampa at fit.vutbr.cz ---
Looking at /usr/src/contrib/nvi/common/exf.c
file_encinit(SCR *sp)
...
if (looks_utf8(buf, blen) > 1)
o_set(sp, O_FILEENCODING, OS_STRDUP, "utf-8", 0);
else if (!O_ISSET(sp, O_FILEENCODING) ||
!strncasecmp(O_STR(sp, O_FILEENCODING), "utf-8", 5))
o_set(sp, O_FILEENCODING, OS_STRDUP, codeset(), 0);
conv_enc(sp, O_FILEENCODING, 0);
}
1. There is no way how to disable auto detection of encoding, if looks_utf8()
returns 2, then there you are lost!!! You can setup your .exrc, but it
will be ignored!!!
2. But why looks_utf() detects 0xe1 0x20 as valid utf-8? IT IS NOT VALID!
Looking at /usr/src/contrib/nvi/common/encoding.c
looks_utf8(const char *ibuf, size_t nbytes)
...
for (n = 0; n < following; n++) {
i++;
if (i >= nbytes)
goto done;
if (buf[i] & 0x40) /* 10xxxxxx */
return -1;
}
That's completely wrong, it doesn't test if bit 7 is set in succeeding bytes!
It should be:
for (n = 0; n < following; n++) {
i++;
if (i >= nbytes)
goto done;
if ((buf[i] & 0xc0) != 0x10) /* 10xxxxxx
*/
return -1;
}
This change is was tested and works.
Please fix at least broken "auto detection" before 10.2-RELEASE! But some
option
to disable auto-detection or honor user setting in .exrc is also required.
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the freebsd-bugs
mailing list