gnu/113343: [PATCH] grep(1) outputs NOT-matched lines (with
multi-bytes characters)
Kazuaki ODA
kazuaki at aliceblue.jp
Mon Jun 4 18:30:06 UTC 2007
>Number: 113343
>Category: gnu
>Synopsis: [PATCH] grep(1) outputs NOT-matched lines (with multi-bytes characters)
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Mon Jun 04 18:30:05 GMT 2007
>Closed-Date:
>Last-Modified:
>Originator: Kazuaki ODA
>Release: FreeBSD 6.2-RELEASE-p5 i386
>Organization:
>Environment:
System: FreeBSD eyes.aliceblue.jp 6.2-RELEASE-p5 FreeBSD 6.2-RELEASE-p5 #3: Sat May 26 12:45:48 JST 2007 kazuaki at eyes.aliceblue.jp:/usr/obj/usr/src/sys/EYES i386
>Description:
Our grep(1) is a bit broken with multi-bytes characters.
If byte sequence matches the searched pattern, grep(1) outputs the line
containing the sequence. Of course, this is fine for single-byte
characters, but may be wrong for multi-bytes characters. If matched
sequence is the second byte of a character and the first byte of the
next character, that is not matched and grep(1) should not output the
line.
Since our grep(1) has support for multi-bytes characters (and locales),
it does not always behave as described above, but sometimes does.
>How-To-Repeat:
>Fix:
Apply attached patch.
mbstate_t should be initialized whenever mbrlen() returns -2, I think.
--- search.c.diff begins here ---
--- gnu/usr.bin/grep/search.c.orig Wed Mar 22 05:51:35 2006
+++ gnu/usr.bin/grep/search.c Tue Jun 5 01:09:24 2007
@@ -400,9 +400,12 @@
}
if (mlen == (size_t) -2)
- /* Offset points inside multibyte character:
- * no good. */
- break;
+ {
+ /* Offset points inside multibyte character:
+ * no good. */
+ memset (&mbs, '\0', sizeof (mbstate_t));
+ break;
+ }
beg += mlen;
bytes_left -= mlen;
@@ -462,9 +465,12 @@
}
if (mlen == (size_t) -2)
- /* Offset points inside multibyte character:
- * no good. */
- break;
+ {
+ /* Offset points inside multibyte character:
+ * no good. */
+ memset (&mbs, '\0', sizeof (mbstate_t));
+ break;
+ }
beg += mlen;
bytes_left -= mlen;
@@ -925,15 +931,21 @@
}
if (mlen == (size_t) -2)
- /* Offset points inside multibyte character: no good. */
- break;
+ {
+ /* Offset points inside multibyte character: no good. */
+ memset (&mbs, '\0', sizeof (mbstate_t));
+ break;
+ }
beg += mlen;
bytes_left -= mlen;
}
if (bytes_left)
- continue;
+ {
+ beg += bytes_left;
+ continue;
+ }
}
else
#endif /* MBS_SUPPORT */
@@ -1051,6 +1063,7 @@
{
/* Offset points inside multibyte character:
* no good. */
+ memset (&mbs, '\0', sizeof (mbstate_t));
break;
}
--- search.c.diff ends here ---
>Release-Note:
>Audit-Trail:
>Unformatted:
More information about the freebsd-bugs
mailing list