gnu/113343: [PATCH] grep(1) outputs NOT-matched lines (with multi-bytes characters)

Kazuaki ODA kazuaki at aliceblue.jp
Mon Jun 4 18:30:06 UTC 2007


>Number:         113343
>Category:       gnu
>Synopsis:       [PATCH] grep(1) outputs NOT-matched lines (with multi-bytes characters)
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Jun 04 18:30:05 GMT 2007
>Closed-Date:
>Last-Modified:
>Originator:     Kazuaki ODA
>Release:        FreeBSD 6.2-RELEASE-p5 i386
>Organization:
>Environment:
System: FreeBSD eyes.aliceblue.jp 6.2-RELEASE-p5 FreeBSD 6.2-RELEASE-p5 #3: Sat May 26 12:45:48 JST 2007 kazuaki at eyes.aliceblue.jp:/usr/obj/usr/src/sys/EYES i386


	
>Description:
	Our grep(1) is a bit broken with multi-bytes characters.
	If byte sequence matches the searched pattern, grep(1) outputs the line
	containing the sequence.  Of course, this is fine for single-byte
	characters, but may be wrong for multi-bytes characters.  If matched
	sequence is the second byte of a character and the first byte of the
	next character, that is not matched and grep(1) should not output the
	line.
	Since our grep(1) has support for multi-bytes characters (and locales),
	it does not always behave as described above, but sometimes does.
>How-To-Repeat:
	
>Fix:

	Apply attached patch.
	mbstate_t should be initialized whenever mbrlen() returns -2, I think.

--- search.c.diff begins here ---
--- gnu/usr.bin/grep/search.c.orig	Wed Mar 22 05:51:35 2006
+++ gnu/usr.bin/grep/search.c	Tue Jun  5 01:09:24 2007
@@ -400,9 +400,12 @@
 			}
 
 		      if (mlen == (size_t) -2)
-			/* Offset points inside multibyte character:
-			 * no good. */
-			break;
+			{
+			  /* Offset points inside multibyte character:
+			   * no good. */
+			  memset (&mbs, '\0', sizeof (mbstate_t));
+			  break;
+			}
 
 		      beg += mlen;
 		      bytes_left -= mlen;
@@ -462,9 +465,12 @@
 			}
 
 		      if (mlen == (size_t) -2)
-			/* Offset points inside multibyte character:
-			 * no good. */
-			break;
+			{
+			  /* Offset points inside multibyte character:
+			   * no good. */
+			  memset (&mbs, '\0', sizeof (mbstate_t));
+			  break;
+			}
 
 		      beg += mlen;
 		      bytes_left -= mlen;
@@ -925,15 +931,21 @@
 		}
 
 	      if (mlen == (size_t) -2)
-		/* Offset points inside multibyte character: no good. */
-		break;
+		{
+		  /* Offset points inside multibyte character: no good. */
+		  memset (&mbs, '\0', sizeof (mbstate_t));
+		  break;
+		}
 
 	      beg += mlen;
 	      bytes_left -= mlen;
 	    }
 
 	  if (bytes_left)
-	    continue;
+	    {
+	      beg += bytes_left;
+	      continue;
+	    }
 	}
       else
 #endif /* MBS_SUPPORT */
@@ -1051,6 +1063,7 @@
 			    {
 			      /* Offset points inside multibyte character:
 			       * no good. */
+			      memset (&mbs, '\0', sizeof (mbstate_t));
 			      break;
 			    }
 
--- search.c.diff ends here ---


>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list