svn commit: r354628 - in stable/11: contrib/netbsd-tests/usr.bin/grep usr.bin/grep usr.bin/grep/tests

Kyle Evans kevans at FreeBSD.org
Mon Nov 11 19:54:10 UTC 2019


Author: kevans
Date: Mon Nov 11 19:54:08 2019
New Revision: 354628
URL: https://svnweb.freebsd.org/changeset/base/354628

Log:
  MFC bsdgrep(1) fixes: r320414, r328559, r332805-r332806, r332809, r332832,
  r332850-r332852, r332856, r332858, r332876, r333351, r334803,
  r334806-r334809, r334821, r334837, r334889, r335188, r351769, r352691
  
  r320414:
  Expect :mmap_eof_not_eol to fail
  
  It relies on a jemalloc feature (opt.redzone) no longer available after
  r319971.
  
  r328559:
  Remove t_grep:mmap_eof_not_eol test
  
  The test was marked as an expected failure in r320414 after r319971's import
  of a newer jemalloc removed an essential feature (opt.redzone) for
  reproducing the behavior it was testing. Since then, no way has been found
  or demonstrated to reliably test the behavior, so remove the test.
  
  r332805:
  bsdgrep: Split match processing out of procfile
  
  procfile is getting kind of hairy, and it's not going to get better as we
  correct some more bits that assume we process one line at a time.
  
  r332806:
  bsdgrep: Clean up procmatches a little bit
  
  r332809:
  bsdgrep: Add some TODOs for future work on operating on chunks
  
  r332832:
  bsdgrep: Break procmatches down a little bit more
  
  Split the matching and non-matching cases out into their own functions to
  reduce future complexity. As the name implies, procmatches will eventually
  process more than one match itself in the future.
  
  r332850:
  bsdgrep: Some light cleanup
  
  There's no point checking for a bunch of file modes if we're not a
  practicing believer of DIR_SKIP or DEV_SKIP.
  
  This also reduces some style violations that were particularly ugly looking
  when browsing through.
  
  r332851:
  bsdgrep: More trivial cleanup/style cleanup
  
  We can avoid branching for these easily reduced patterns
  
  r332852:
  bsdgrep: if chain => switch
  
  This makes some of this a little easier to follow (in my opinion).
  
  r332856:
  bsdgrep: Fix --include/--exclude ordering issues
  
  Prior to r332851:
  * --exclude always win out over --include
  * --exclude-dir always wins out over --include-dir
  
  r332851 broke that behavior, resulting in:
  * First of --exclude, --include wins
  * First of --exclude-dir, --include-dir wins
  
  As it turns out, both behaviors are wrong by modern grep standards- the
  latest rule wins. e.g.:
  
  `grep --exclude foo --include foo 'thing' foo`
  foo is included
  
  `grep --include foo --exclude foo 'thing' foo`
  foo is excluded
  
  As tested with GNU grep 3.1.
  
  This commit makes bsdgrep follow this behavior.
  
  r332858:
  bsdgrep: Use grep_strdup instead of grep_malloc+strcpy
  
  r332876:
  bsdgrep: Fix build failure WITHOUT_LZMA (incorrect bracket placement)
  
  r333351:
  bsdgrep: Allow "-" to be passed to -f to mean "standard input"
  
  A version of this patch was originally sent to me by se@, matching behavior
  from newer versions of GNU grep.
  
  While there have been some differences of opinion on whether stdin should be
  closed or not after depleting it in process of -f, I've opted to leave stdin
  open and just let the later matching stuff fail and result in a no-match.
  I'm not married to the current behavior- it was generally chosen since we
  are adopting this in particular from GNU grep, and I would like to stay
  consistent without a strong argument to the contrary. The current behavior
  isn't technically wrong, it's just fairly unfriendly to the developer-user
  of grep that may not realize their usage is trivially invalid.
  
  r334803:
  netbsd-tests: grep(1): Add test for -c flag
  
  Someone might be inclined to accidentally break this. someone might have
  written said test because they broke it locally.
  
  r334806:
  bsdgrep(1): Do some less dirty things with return types
  
  Neither procfile nor grep_tree return anything meaningful to their callers.
  None of the callers actually care about how many lines were matched in all
  of the files they processed; it's all about "did anything match?"
  
  This is generally just a light refactoring to remind me of what actually
  matters as I'm rewriting these bits to care less about 'stuff'.
  
  r334807:
  bsdgrep(1): whoops, garbage collect the now write-only variable
  
  r334808:
  bsdgrep(1): Don't initialize fts_flags twice
  
  Admittedly, this is a clang-scan complaint... but it wasn't wrong. fts_flags
  is initialized by all cases in the switch(), which should be fairly obvious.
  Annotate this anyways.
  
  r334809:
  netbsd-tests: bsdgrep(1): Add a test for -m, too
  
  r334821:
  bsdgrep(1): Slooowly peel away the chunky onion
  
  (or peel off the band-aid, whatever floats your boat)
  
  This addresses two separate issues:
  
  1.) Nothing within bsdgrep actually knew whether it cared about line numbers
    or not.
  
  2.) The file layer knew nothing about the context in which it was being
    called.
  
  #1 is only important when we're *not* processing line-by-line. #2 is
  debatably a good idea; the parsing context is only handy because that's
  where we store current offset information and, as of this commit, whether or
  not it needs to be line-aware.
  
  r334837:
  bsdgrep(1): Evict character sequence that moved in
  
  r334889:
  bsdgrep(1): Some more int -> bool conversions and name changes
  
  Again motivated by upcoming work to rewrite a bunch of this- single-letter
  variable names and slightly misleading variable names ("lastmatches" to
  indicate that the last matched) are not helpful.
  
  r335188:
  bsdgrep(1): Remove redundant initialization; unconditionally assigned later
  
  r351769:
  bsdgrep(1): add some basic tests for some GNU Extension support
  
  These will be expanded later as I come up with good test cases; for now,
  these seem to be enough to trigger bugs in base gnugrep and expose missing
  features in bsdgrep.
  
  r352691:
  bsdgrep(1): various fixes of empty pattern/exit code/-c behavior
  
  When an empty pattern is encountered in the pattern list, I had previously
  broken bsdgrep to count that as a "match all" and ignore any other patterns
  in the list. This commit rectifies that mistake, among others:
  
  - The -v flag semantics were not quite right; lines matched should have been
    counted differently based on whether the -v flag was set or not. procline
    now definitively returns whether it's matched or not, and interpreting
    that result has been kicked up a level.
  - Empty patterns with the -x flag was broken similarly to empty patterns
    with the -w flag. The former is a whole-line match and should be more
    strict, only matching blank lines. No -x and no -w will will match the
    empty string at the beginning of each line.
  - The exit code with -L was broken, w.r.t. modern grep. Modern grap will
    exit(0) if any file that didn't match was output, so our interpretation
    was simply backwards. The new interpretation makes sense to me.
  
  Tests updated and added to try and catch some of this.
  
  This misbehavior was found by autoconf while fixing ports found in PR 229925
  expecting either a more sane or a more GNU-like sed.

Modified:
  stable/11/contrib/netbsd-tests/usr.bin/grep/t_grep.sh
  stable/11/usr.bin/grep/file.c
  stable/11/usr.bin/grep/grep.1
  stable/11/usr.bin/grep/grep.c
  stable/11/usr.bin/grep/grep.h
  stable/11/usr.bin/grep/tests/grep_freebsd_test.sh
  stable/11/usr.bin/grep/util.c
Directory Properties:
  stable/11/   (props changed)

Modified: stable/11/contrib/netbsd-tests/usr.bin/grep/t_grep.sh
==============================================================================
--- stable/11/contrib/netbsd-tests/usr.bin/grep/t_grep.sh	Mon Nov 11 19:06:04 2019	(r354627)
+++ stable/11/contrib/netbsd-tests/usr.bin/grep/t_grep.sh	Mon Nov 11 19:54:08 2019	(r354628)
@@ -413,6 +413,60 @@ wflag_emptypat_body()
 	atf_check -o file:test4 grep -w -e "" test4
 }
 
+atf_test_case xflag_emptypat
+xflag_emptypat_body()
+{
+	printf "" > test1
+	printf "\n" > test2
+	printf "qaz" > test3
+	printf " qaz\n" > test4
+
+	# -x is whole-line, more strict than -w.
+	atf_check -s exit:1 -o empty grep -x -e "" test1
+
+	atf_check -o file:test2 grep -x -e "" test2
+
+	atf_check -s exit:1 -o empty grep -x -e "" test3
+
+	atf_check -s exit:1 -o empty grep -x -e "" test4
+
+	total=$(wc -l /COPYRIGHT | sed 's/[^0-9]//g')
+
+	# Simple checks that grep -x with an empty pattern isn't matching every
+	# line.  The exact counts aren't important, as long as they don't
+	# match the total line count and as long as they don't match each other.
+	atf_check -o save:xpositive.count grep -Fxc '' /COPYRIGHT
+	atf_check -o save:xnegative.count grep -Fvxc '' /COPYRIGHT
+
+	atf_check -o not-inline:"${total}" cat xpositive.count
+	atf_check -o not-inline:"${total}" cat xnegative.count
+
+	atf_check -o not-file:xnegative.count cat xpositive.count
+}
+
+atf_test_case xflag_emptypat_plus
+xflag_emptypat_plus_body()
+{
+	printf "foo\n\nbar\n\nbaz\n" > target
+	printf "foo\n \nbar\n \nbaz\n" > target_spacelines
+	printf "foo\nbar\nbaz\n" > matches
+	printf " \n \n" > spacelines
+
+	printf "foo\n\nbar\n\nbaz\n" > patlist1
+	printf "foo\n\nba\n\nbaz\n" > patlist2
+
+	sed -e '/bar/d' target > matches_not2
+
+	# Normal handling first
+	atf_check -o file:target grep -Fxf patlist1 target
+	atf_check -o file:matches grep -Fxf patlist1 target_spacelines
+	atf_check -o file:matches_not2 grep -Fxf patlist2 target
+
+	# -v handling
+	atf_check -s exit:1 -o empty grep -Fvxf patlist1 target
+	atf_check -o file:spacelines grep -Fxvf patlist1 target_spacelines
+}
+
 atf_test_case excessive_matches
 excessive_matches_head()
 {
@@ -551,6 +605,12 @@ grep_nomatch_flags_head()
 
 grep_nomatch_flags_body()
 {
+	grep_type
+
+	if [ $? -eq $GREP_TYPE_GNU_FREEBSD ]; then
+		atf_expect_fail "this test does not pass with GNU grep in base"
+	fi
+
 	printf "A\nB\nC\n" > test1
 
 	atf_check -o inline:"1\n" grep -c -C 1 -e "B" test1
@@ -563,7 +623,7 @@ grep_nomatch_flags_body()
 	atf_check -o inline:"test1\n" grep -l -A 1 -e "B" test1
 	atf_check -o inline:"test1\n" grep -l -C 1 -e "B" test1
 
-	atf_check -s exit:1 -o inline:"test1\n" grep -L -e "D" test1
+	atf_check -o inline:"test1\n" grep -L -e "D" test1
 
 	atf_check -o empty grep -q -e "B" test1
 	atf_check -o empty grep -q -B 1 -e "B" test1
@@ -646,28 +706,6 @@ mmap_body()
 	atf_check -s exit:1 grep --mmap -e "Z" test1
 }
 
-atf_test_case mmap_eof_not_eol
-mmap_eof_not_eol_head()
-{
-	atf_set "descr" "Check --mmap flag handling of encountering EOF without EOL (PR 165471, 219402)"
-}
-mmap_eof_not_eol_body()
-{
-	grep_type
-	if [ $? -eq $GREP_TYPE_GNU ]; then
-		atf_expect_fail "gnu grep from ports has no --mmap option"
-	fi
-
-	printf "ABC" > test1
-	jot -b " "  -s "" 4096 >> test2
-
-	atf_check -s exit:0 -o inline:"B\n" grep --mmap -oe "B" test1
-	# Dependency on jemalloc(3) to detect buffer overflow, otherwise this
-	# unreliably produces a SIGSEGV or SIGBUS
-	atf_check -s exit:0 -o not-empty \
-	    env MALLOC_CONF="redzone:true" grep --mmap -e " " test2
-}
-
 atf_test_case matchall
 matchall_head()
 {
@@ -738,6 +776,38 @@ fgrep_oflag_body()
 	atf_check -s exit:1 grep -Fo "ghix" test1
 	atf_check -s exit:1 grep -Fo "abcdefghiklmnopqrstuvwxyz" test1
 }
+
+atf_test_case cflag
+cflag_head()
+{
+	atf_set "descr" "Check proper handling of -c"
+}
+cflag_body()
+{
+	printf "a\nb\nc\n" > test1
+
+	atf_check -o inline:"1\n" grep -Ec "a" test1
+	atf_check -o inline:"2\n" grep -Ec "a|b" test1
+	atf_check -o inline:"3\n" grep -Ec "a|b|c" test1
+
+	atf_check -o inline:"test1:2\n" grep -EHc "a|b" test1
+}
+
+atf_test_case mflag
+mflag_head()
+{
+	atf_set "descr" "Check proper handling of -m"
+}
+mflag_body()
+{
+	printf "a\nb\nc\nd\ne\nf\n" > test1
+
+	atf_check -o inline:"1\n" grep -m 1 -Ec "a" test1
+	atf_check -o inline:"2\n" grep -m 2 -Ec "a|b" test1
+	atf_check -o inline:"3\n" grep -m 3 -Ec "a|b|c|f" test1
+
+	atf_check -o inline:"test1:2\n" grep -m 2 -EHc "a|b|e|f" test1
+}
 # End FreeBSD
 
 atf_init_test_cases()
@@ -767,6 +837,8 @@ atf_init_test_cases()
 	atf_add_test_case egrep_empty_invalid
 	atf_add_test_case zerolen
 	atf_add_test_case wflag_emptypat
+	atf_add_test_case xflag_emptypat
+	atf_add_test_case xflag_emptypat_plus
 	atf_add_test_case excessive_matches
 	atf_add_test_case wv_combo_break
 	atf_add_test_case fgrep_sanity
@@ -777,10 +849,11 @@ atf_init_test_cases()
 	atf_add_test_case binary_flags
 	atf_add_test_case badcontext
 	atf_add_test_case mmap
-	atf_add_test_case mmap_eof_not_eol
 	atf_add_test_case matchall
 	atf_add_test_case fgrep_multipattern
 	atf_add_test_case fgrep_icase
 	atf_add_test_case fgrep_oflag
+	atf_add_test_case cflag
+	atf_add_test_case mflag
 # End FreeBSD
 }

Modified: stable/11/usr.bin/grep/file.c
==============================================================================
--- stable/11/usr.bin/grep/file.c	Mon Nov 11 19:06:04 2019	(r354627)
+++ stable/11/usr.bin/grep/file.c	Mon Nov 11 19:54:08 2019	(r354628)
@@ -86,6 +86,9 @@ static inline int
 grep_refill(struct file *f)
 {
 	ssize_t nr;
+#ifndef WITHOUT_LZMA
+	lzma_ret lzmaret;
+#endif
 
 	if (filebehave == FILE_MMAP)
 		return (0);
@@ -93,41 +96,52 @@ grep_refill(struct file *f)
 	bufpos = buffer;
 	bufrem = 0;
 
-	if (filebehave == FILE_GZIP) {
+	switch (filebehave) {
+	case FILE_GZIP:
 		nr = gzread(gzbufdesc, buffer, MAXBUFSIZ);
+		break;
 #ifndef WITHOUT_BZIP2
-	} else if (filebehave == FILE_BZIP && bzbufdesc != NULL) {
-		int bzerr;
+	case FILE_BZIP:
+		if (bzbufdesc != NULL) {
+			int bzerr;
 
-		nr = BZ2_bzRead(&bzerr, bzbufdesc, buffer, MAXBUFSIZ);
-		switch (bzerr) {
-		case BZ_OK:
-		case BZ_STREAM_END:
-			/* No problem, nr will be okay */
-			break;
-		case BZ_DATA_ERROR_MAGIC:
+			nr = BZ2_bzRead(&bzerr, bzbufdesc, buffer, MAXBUFSIZ);
+			switch (bzerr) {
+			case BZ_OK:
+			case BZ_STREAM_END:
+				/* No problem, nr will be okay */
+				break;
+			case BZ_DATA_ERROR_MAGIC:
+				/*
+				 * As opposed to gzread(), which simply returns the
+				 * plain file data, if it is not in the correct
+				 * compressed format, BZ2_bzRead() instead aborts.
+				 *
+				 * So, just restart at the beginning of the file again,
+				 * and use plain reads from now on.
+				 */
+				BZ2_bzReadClose(&bzerr, bzbufdesc);
+				bzbufdesc = NULL;
+				if (lseek(f->fd, 0, SEEK_SET) == -1)
+					return (-1);
+				nr = read(f->fd, buffer, MAXBUFSIZ);
+				break;
+			default:
+				/* Make sure we exit with an error */
+				nr = -1;
+			}
+		} else
 			/*
-			 * As opposed to gzread(), which simply returns the
-			 * plain file data, if it is not in the correct
-			 * compressed format, BZ2_bzRead() instead aborts.
-			 *
-			 * So, just restart at the beginning of the file again,
-			 * and use plain reads from now on.
+			 * Also an error case; we should never have a scenario
+			 * where we have an open file but no bzip descriptor
+			 * at this point. See: grep_open
 			 */
-			BZ2_bzReadClose(&bzerr, bzbufdesc);
-			bzbufdesc = NULL;
-			if (lseek(f->fd, 0, SEEK_SET) == -1)
-				return (-1);
-			nr = read(f->fd, buffer, MAXBUFSIZ);
-			break;
-		default:
-			/* Make sure we exit with an error */
 			nr = -1;
-		}
+		break;
 #endif
 #ifndef WITHOUT_LZMA
-	} else if ((filebehave == FILE_XZ) || (filebehave == FILE_LZMA)) {
-		lzma_ret ret;
+	case FILE_XZ:
+	case FILE_LZMA:
 		lstrm.next_out = buffer;
 
 		do {
@@ -143,23 +157,23 @@ grep_refill(struct file *f)
 				lstrm.avail_in = nr;
 			}
 
-			ret = lzma_code(&lstrm, laction);
+			lzmaret = lzma_code(&lstrm, laction);
 
-			if (ret != LZMA_OK && ret != LZMA_STREAM_END)
+			if (lzmaret != LZMA_OK && lzmaret != LZMA_STREAM_END)
 				return (-1);
 
-			if (lstrm.avail_out == 0 || ret == LZMA_STREAM_END) {
+			if (lstrm.avail_out == 0 || lzmaret == LZMA_STREAM_END) {
 				bufrem = MAXBUFSIZ - lstrm.avail_out;
 				lstrm.next_out = buffer;
 				lstrm.avail_out = MAXBUFSIZ;
 			}
-		} while (bufrem == 0 && ret != LZMA_STREAM_END);
+		} while (bufrem == 0 && lzmaret != LZMA_STREAM_END);
 
 		return (0);
-#endif	/* WIHTOUT_LZMA */
-	} else
+#endif	/* WITHOUT_LZMA */
+	default:
 		nr = read(f->fd, buffer, MAXBUFSIZ);
-
+	}
 	if (nr < 0)
 		return (-1);
 
@@ -180,7 +194,7 @@ grep_lnbufgrow(size_t newlen)
 }
 
 char *
-grep_fgetln(struct file *f, size_t *lenp)
+grep_fgetln(struct file *f, struct parsec *pc)
 {
 	unsigned char *p;
 	char *ret;
@@ -194,7 +208,7 @@ grep_fgetln(struct file *f, size_t *lenp)
 
 	if (bufrem == 0) {
 		/* Return zero length to indicate EOF */
-		*lenp = 0;
+		pc->ln.len= 0;
 		return (bufpos);
 	}
 
@@ -205,7 +219,7 @@ grep_fgetln(struct file *f, size_t *lenp)
 		len = p - bufpos;
 		bufrem -= len;
 		bufpos = p;
-		*lenp = len;
+		pc->ln.len = len;
 		return (ret);
 	}
 
@@ -240,11 +254,11 @@ grep_fgetln(struct file *f, size_t *lenp)
 		bufpos = p;
 		break;
 	}
-	*lenp = len;
+	pc->ln.len = len;
 	return (lnbuf);
 
 error:
-	*lenp = 0;
+	pc->ln.len = 0;
 	return (NULL);
 }
 
@@ -255,6 +269,9 @@ struct file *
 grep_open(const char *path)
 {
 	struct file *f;
+#ifndef WITHOUT_LZMA
+	lzma_ret lzmaret;
+#endif
 
 	f = grep_malloc(sizeof *f);
 	memset(f, 0, sizeof *f);
@@ -292,32 +309,36 @@ grep_open(const char *path)
 	if ((buffer == NULL) || (buffer == MAP_FAILED))
 		buffer = grep_malloc(MAXBUFSIZ);
 
-	if (filebehave == FILE_GZIP &&
-	    (gzbufdesc = gzdopen(f->fd, "r")) == NULL)
-		goto error2;
-
+	switch (filebehave) {
+	case FILE_GZIP:
+		if ((gzbufdesc = gzdopen(f->fd, "r")) == NULL)
+			goto error2;
+		break;
 #ifndef WITHOUT_BZIP2
-	if (filebehave == FILE_BZIP &&
-	    (bzbufdesc = BZ2_bzdopen(f->fd, "r")) == NULL)
-		goto error2;
+	case FILE_BZIP:
+		if ((bzbufdesc = BZ2_bzdopen(f->fd, "r")) == NULL)
+			goto error2;
+		break;
 #endif
 #ifndef WITHOUT_LZMA
-	else if ((filebehave == FILE_XZ) || (filebehave == FILE_LZMA)) {
-		lzma_ret ret;
+	case FILE_XZ:
+	case FILE_LZMA:
 
-		ret = (filebehave == FILE_XZ) ?
-			lzma_stream_decoder(&lstrm, UINT64_MAX,
-					LZMA_CONCATENATED) :
-			lzma_alone_decoder(&lstrm, UINT64_MAX);
+		if (filebehave == FILE_XZ)
+			lzmaret = lzma_stream_decoder(&lstrm, UINT64_MAX,
+			    LZMA_CONCATENATED);
+		else
+			lzmaret = lzma_alone_decoder(&lstrm, UINT64_MAX);
 
-		if (ret != LZMA_OK)
+		if (lzmaret != LZMA_OK)
 			goto error2;
 
 		lstrm.avail_in = 0;
 		lstrm.avail_out = MAXBUFSIZ;
 		laction = LZMA_RUN;
-	}
+		break;
 #endif
+	}
 
 	/* Fill read buffer, also catches errors early */
 	if (bufrem == 0 && grep_refill(f) != 0)
@@ -326,7 +347,7 @@ grep_open(const char *path)
 	/* Check for binary stuff, if necessary */
 	if (binbehave != BINFILE_TEXT && fileeol != '\0' &&
 	    memchr(bufpos, '\0', bufrem) != NULL)
-	f->binary = true;
+		f->binary = true;
 
 	return (f);
 

Modified: stable/11/usr.bin/grep/grep.1
==============================================================================
--- stable/11/usr.bin/grep/grep.1	Mon Nov 11 19:06:04 2019	(r354627)
+++ stable/11/usr.bin/grep/grep.1	Mon Nov 11 19:54:08 2019	(r354628)
@@ -30,7 +30,7 @@
 .\"
 .\"	@(#)grep.1	8.3 (Berkeley) 4/18/94
 .\"
-.Dd April 17, 2017
+.Dd May 7, 2018
 .Dt GREP 1
 .Os
 .Sh NAME
@@ -410,6 +410,13 @@ and block buffered otherwise.
 .El
 .Pp
 If no file arguments are specified, the standard input is used.
+Additionally,
+.Dq -
+may be used in place of a file name, anywhere that a file name is accepted, to
+read from standard input.
+This includes both
+.Fl f
+and file arguments.
 .Sh EXIT STATUS
 The
 .Nm grep

Modified: stable/11/usr.bin/grep/grep.c
==============================================================================
--- stable/11/usr.bin/grep/grep.c	Mon Nov 11 19:06:04 2019	(r354627)
+++ stable/11/usr.bin/grep/grep.c	Mon Nov 11 19:54:08 2019	(r354628)
@@ -239,20 +239,9 @@ static void
 add_pattern(char *pat, size_t len)
 {
 
-	/* Do not add further pattern is we already match everything */
-	if (matchall)
-	  return;
-
 	/* Check if we can do a shortcut */
 	if (len == 0) {
 		matchall = true;
-		for (unsigned int i = 0; i < patterns; i++) {
-			free(pattern[i].pat);
-		}
-		pattern = grep_realloc(pattern, sizeof(struct pat));
-		pattern[0].pat = NULL;
-		pattern[0].len = 0;
-		patterns = 1;
 		return;
 	}
 	/* Increase size if necessary */
@@ -319,7 +308,9 @@ read_patterns(const char *fn)
 	size_t len;
 	ssize_t rlen;
 
-	if ((f = fopen(fn, "r")) == NULL)
+	if (strcmp(fn, "-") == 0)
+		f = stdin;
+	else if ((f = fopen(fn, "r")) == NULL)
 		err(2, "%s", fn);
 	if ((fstat(fileno(f), &st) == -1) || (S_ISDIR(st.st_mode))) {
 		fclose(f);
@@ -336,7 +327,8 @@ read_patterns(const char *fn)
 	free(line);
 	if (ferror(f))
 		err(2, "%s", fn);
-	fclose(f);
+	if (strcmp(fn, "-") != 0)
+		fclose(f);
 }
 
 static inline const char *
@@ -357,6 +349,7 @@ main(int argc, char *argv[])
 	long long l;
 	unsigned int aargc, eargc, i;
 	int c, lastc, needpattern, newarg, prevoptind;
+	bool matched;
 
 	setlocale(LC_ALL, "");
 
@@ -701,7 +694,7 @@ main(int argc, char *argv[])
 	aargv += optind;
 
 	/* Empty pattern file matches nothing */
-	if (!needpattern && (patterns == 0))
+	if (!needpattern && (patterns == 0) && !matchall)
 		exit(1);
 
 	/* Fail if we don't have any pattern */
@@ -751,11 +744,10 @@ main(int argc, char *argv[])
 #endif
 	r_pattern = grep_calloc(patterns, sizeof(*r_pattern));
 
-	/* Don't process any patterns if we have a blank one */
 #ifdef WITH_INTERNAL_NOSPEC
-	if (!matchall && grepbehave != GREP_FIXED) {
+	if (grepbehave != GREP_FIXED) {
 #else
-	if (!matchall) {
+	{
 #endif
 		/* Check if cheating is allowed (always is for fgrep). */
 		for (i = 0; i < patterns; ++i) {
@@ -787,12 +779,13 @@ main(int argc, char *argv[])
 		exit(!procfile("-"));
 
 	if (dirbehave == DIR_RECURSE)
-		c = grep_tree(aargv);
+		matched = grep_tree(aargv);
 	else
-		for (c = 0; aargc--; ++aargv) {
+		for (matched = false; aargc--; ++aargv) {
 			if ((finclude || fexclude) && !file_matching(*aargv))
 				continue;
-			c+= procfile(*aargv);
+			if (procfile(*aargv))
+				matched = true;
 		}
 
 #ifndef WITHOUT_NLS
@@ -801,5 +794,8 @@ main(int argc, char *argv[])
 
 	/* Find out the correct return value according to the
 	   results and the command line option. */
-	exit(c ? (file_err ? (qflag ? 0 : 2) : 0) : (file_err ? 2 : 1));
+	if (Lflag)
+		matched = !matched;
+
+	exit(matched ? (file_err ? (qflag ? 0 : 2) : 0) : (file_err ? 2 : 1));
 }

Modified: stable/11/usr.bin/grep/grep.h
==============================================================================
--- stable/11/usr.bin/grep/grep.h	Mon Nov 11 19:06:04 2019	(r354627)
+++ stable/11/usr.bin/grep/grep.h	Mon Nov 11 19:54:08 2019	(r354628)
@@ -114,6 +114,21 @@ struct epat {
 	int		 mode;
 };
 
+/*
+ * Parsing context; used to hold things like matches made and
+ * other useful bits
+ */
+struct parsec {
+	regmatch_t	matches[MAX_MATCHES];		/* Matches made */
+	/* XXX TODO: This should be a chunk, not a line */
+	struct str	ln;				/* Current line */
+	size_t		lnstart;			/* Position in line */
+	size_t		matchidx;			/* Latest match index */
+	int		printed;			/* Metadata printed? */
+	bool		binary;				/* Binary file? */
+	bool		cntlines;			/* Count lines? */
+};
+
 /* Flags passed to regcomp() and regexec() */
 extern int	 cflags, eflags;
 
@@ -145,8 +160,8 @@ extern char	 re_error[RE_ERROR_BUF + 1];	/* Seems big 
 
 /* util.c */
 bool	 file_matching(const char *fname);
-int	 procfile(const char *fn);
-int	 grep_tree(char **argv);
+bool	 procfile(const char *fn);
+bool	 grep_tree(char **argv);
 void	*grep_malloc(size_t size);
 void	*grep_calloc(size_t nmemb, size_t size);
 void	*grep_realloc(void *ptr, size_t size);
@@ -161,4 +176,4 @@ void	 clearqueue(void);
 /* file.c */
 void		 grep_close(struct file *f);
 struct file	*grep_open(const char *path);
-char		*grep_fgetln(struct file *f, size_t *len);
+char		*grep_fgetln(struct file *f, struct parsec *pc);

Modified: stable/11/usr.bin/grep/tests/grep_freebsd_test.sh
==============================================================================
--- stable/11/usr.bin/grep/tests/grep_freebsd_test.sh	Mon Nov 11 19:06:04 2019	(r354627)
+++ stable/11/usr.bin/grep/tests/grep_freebsd_test.sh	Mon Nov 11 19:54:08 2019	(r354628)
@@ -81,8 +81,34 @@ rgrep_body()
 	atf_check -o file:d_grep_r_implied.out rgrep --exclude="*.out" -e "test" "$(atf_get_srcdir)"
 }
 
+atf_test_case gnuext
+gnuext_body()
+{
+	grep_type
+	_type=$?
+	if [ $_type -eq $GREP_TYPE_BSD ]; then
+		atf_expect_fail "this test requires GNU extensions in regex(3)"
+	elif [ $_type -eq $GREP_TYPE_GNU_FREEBSD ]; then
+		atf_expect_fail "\\s and \\S are known to be buggy in base gnugrep"
+	fi
+
+	atf_check -o save:grep_alnum.out grep -o '[[:alnum:]]' /COPYRIGHT
+	atf_check -o file:grep_alnum.out grep -o '\w' /COPYRIGHT
+
+	atf_check -o save:grep_nalnum.out grep -o '[^[:alnum:]]' /COPYRIGHT
+	atf_check -o file:grep_nalnum.out grep -o '\W' /COPYRIGHT
+
+	atf_check -o save:grep_space.out grep -o '[[:space:]]' /COPYRIGHT
+	atf_check -o file:grep_space.out grep -o '\s' /COPYRIGHT
+
+	atf_check -o save:grep_nspace.out grep -o '[^[:space:]]' /COPYRIGHT
+	atf_check -o file:grep_nspace.out grep -o '\S' /COPYRIGHT
+
+}
+
 atf_init_test_cases()
 {
 	atf_add_test_case grep_r_implied
 	atf_add_test_case rgrep
+	atf_add_test_case gnuext
 }

Modified: stable/11/usr.bin/grep/util.c
==============================================================================
--- stable/11/usr.bin/grep/util.c	Mon Nov 11 19:06:04 2019	(r354627)
+++ stable/11/usr.bin/grep/util.c	Mon Nov 11 19:54:08 2019	(r354628)
@@ -60,23 +60,24 @@ __FBSDID("$FreeBSD$");
 static bool	 first_match = true;
 
 /*
- * Parsing context; used to hold things like matches made and
- * other useful bits
+ * Match printing context
  */
-struct parsec {
-	regmatch_t	matches[MAX_MATCHES];		/* Matches made */
-	struct str	ln;				/* Current line */
-	size_t		lnstart;			/* Position in line */
-	size_t		matchidx;			/* Latest match index */
-	int		printed;			/* Metadata printed? */
-	bool		binary;				/* Binary file? */
+struct mprintc {
+	long long	tail;		/* Number of trailing lines to record */
+	int		last_outed;	/* Number of lines since last output */
+	bool		doctx;		/* Printing context? */
+	bool		printmatch;	/* Printing matches? */
+	bool		same_file;	/* Same file as previously printed? */
 };
 
+static void procmatch_match(struct mprintc *mc, struct parsec *pc);
+static void procmatch_nomatch(struct mprintc *mc, struct parsec *pc);
+static bool procmatches(struct mprintc *mc, struct parsec *pc, bool matched);
 #ifdef WITH_INTERNAL_NOSPEC
 static int litexec(const struct pat *pat, const char *string,
     size_t nmatch, regmatch_t pmatch[]);
 #endif
-static int procline(struct parsec *pc);
+static bool procline(struct parsec *pc);
 static void printline(struct parsec *pc, int sep);
 static void printline_metadata(struct str *line, int sep);
 
@@ -94,13 +95,12 @@ file_matching(const char *fname)
 
 	for (unsigned int i = 0; i < fpatterns; ++i) {
 		if (fnmatch(fpattern[i].pat, fname, 0) == 0 ||
-		    fnmatch(fpattern[i].pat, fname_base, 0) == 0) {
-			if (fpattern[i].mode == EXCL_PAT) {
-				ret = false;
-				break;
-			} else
-				ret = true;
-		}
+		    fnmatch(fpattern[i].pat, fname_base, 0) == 0)
+			/*
+			 * The last pattern matched wins exclusion/inclusion
+			 * rights, so we can't reasonably bail out early here.
+			 */
+			ret = (fpattern[i].mode != EXCL_PAT);
 	}
 	free(fname_buf);
 	return (ret);
@@ -114,13 +114,12 @@ dir_matching(const char *dname)
 	ret = dinclude ? false : true;
 
 	for (unsigned int i = 0; i < dpatterns; ++i) {
-		if (dname != NULL &&
-		    fnmatch(dpattern[i].pat, dname, 0) == 0) {
-			if (dpattern[i].mode == EXCL_PAT)
-				return (false);
-			else
-				ret = true;
-		}
+		if (dname != NULL && fnmatch(dpattern[i].pat, dname, 0) == 0)
+			/*
+			 * The last pattern matched wins exclusion/inclusion
+			 * rights, so we can't reasonably bail out early here.
+			 */
+			ret = (dpattern[i].mode != EXCL_PAT);
 	}
 	return (ret);
 }
@@ -129,17 +128,18 @@ dir_matching(const char *dname)
  * Processes a directory when a recursive search is performed with
  * the -R option.  Each appropriate file is passed to procfile().
  */
-int
+bool
 grep_tree(char **argv)
 {
 	FTS *fts;
 	FTSENT *p;
-	int c, fts_flags;
-	bool ok;
+	int fts_flags;
+	bool matched, ok;
 	const char *wd[] = { ".", NULL };
 
-	c = fts_flags = 0;
+	matched = false;
 
+	/* This switch effectively initializes 'fts_flags' */
 	switch(linkbehave) {
 	case LINK_EXPLICIT:
 		fts_flags = FTS_COMFOLLOW;
@@ -149,7 +149,6 @@ grep_tree(char **argv)
 		break;
 	default:
 		fts_flags = FTS_LOGICAL;
-			
 	}
 
 	fts_flags |= FTS_NOSTAT | FTS_NOCHDIR;
@@ -178,7 +177,7 @@ grep_tree(char **argv)
 		case FTS_DC:
 			/* Print a warning for recursive directory loop */
 			warnx("warning: %s: recursive directory loop",
-				p->fts_path);
+			    p->fts_path);
 			break;
 		default:
 			/* Check for file exclusion/inclusion */
@@ -186,44 +185,122 @@ grep_tree(char **argv)
 			if (fexclude || finclude)
 				ok &= file_matching(p->fts_path);
 
-			if (ok)
-				c += procfile(p->fts_path);
+			if (ok && procfile(p->fts_path))
+				matched = true;
 			break;
 		}
 	}
 
 	fts_close(fts);
-	return (c);
+	return (matched);
 }
 
+static void
+procmatch_match(struct mprintc *mc, struct parsec *pc)
+{
+
+	if (mc->doctx) {
+		if (!first_match && (!mc->same_file || mc->last_outed > 0))
+			printf("--\n");
+		if (Bflag > 0)
+			printqueue();
+		mc->tail = Aflag;
+	}
+
+	/* Print the matching line, but only if not quiet/binary */
+	if (mc->printmatch) {
+		printline(pc, ':');
+		while (pc->matchidx >= MAX_MATCHES) {
+			/* Reset matchidx and try again */
+			pc->matchidx = 0;
+			if (procline(pc) == !vflag)
+				printline(pc, ':');
+			else
+				break;
+		}
+		first_match = false;
+		mc->same_file = true;
+		mc->last_outed = 0;
+	}
+}
+
+static void
+procmatch_nomatch(struct mprintc *mc, struct parsec *pc)
+{
+
+	/* Deal with any -A context as needed */
+	if (mc->tail > 0) {
+		grep_printline(&pc->ln, '-');
+		mc->tail--;
+		if (Bflag > 0)
+			clearqueue();
+	} else if (Bflag == 0 || (Bflag > 0 && enqueue(&pc->ln)))
+		/*
+		 * Enqueue non-matching lines for -B context. If we're not
+		 * actually doing -B context or if the enqueue resulted in a
+		 * line being rotated out, then go ahead and increment
+		 * last_outed to signify a gap between context/match.
+		 */
+		++mc->last_outed;
+}
+
 /*
+ * Process any matches in the current parsing context, return a boolean
+ * indicating whether we should halt any further processing or not. 'true' to
+ * continue processing, 'false' to halt.
+ */
+static bool
+procmatches(struct mprintc *mc, struct parsec *pc, bool matched)
+{
+
+	/*
+	 * XXX TODO: This should loop over pc->matches and handle things on a
+	 * line-by-line basis, setting up a `struct str` as needed.
+	 */
+	/* Deal with any -B context or context separators */
+	if (matched) {
+		procmatch_match(mc, pc);
+
+		/* Count the matches if we have a match limit */
+		if (mflag) {
+			/* XXX TODO: Decrement by number of matched lines */
+			mcount -= 1;
+			if (mcount <= 0)
+				return (false);
+		}
+	} else if (mc->doctx)
+		procmatch_nomatch(mc, pc);
+
+	return (true);
+}
+
+/*
  * Opens a file and processes it.  Each file is processed line-by-line
  * passing the lines to procline().
  */
-int
+bool
 procfile(const char *fn)
 {
 	struct parsec pc;
-	long long tail;
+	struct mprintc mc;
 	struct file *f;
 	struct stat sb;
-	struct str *ln;
 	mode_t s;
-	int c, last_outed, t;
-	bool doctx, printmatch, same_file;
+	int lines;
+	bool line_matched;
 
 	if (strcmp(fn, "-") == 0) {
 		fn = label != NULL ? label : getstr(1);
 		f = grep_open(NULL);
 	} else {
-		if (!stat(fn, &sb)) {
+		if (stat(fn, &sb) == 0) {
 			/* Check if we need to process the file */
 			s = sb.st_mode & S_IFMT;
-			if (s == S_IFDIR && dirbehave == DIR_SKIP)
-				return (0);
-			if ((s == S_IFIFO || s == S_IFCHR || s == S_IFBLK
-				|| s == S_IFSOCK) && devbehave == DEV_SKIP)
-					return (0);
+			if (dirbehave == DIR_SKIP && s == S_IFDIR)
+				return (false);
+			if (devbehave == DEV_SKIP && (s == S_IFIFO ||
+			    s == S_IFCHR || s == S_IFBLK || s == S_IFSOCK))
+				return (false);
 		}
 		f = grep_open(fn);
 	}
@@ -231,39 +308,41 @@ procfile(const char *fn)
 		file_err = true;
 		if (!sflag)
 			warn("%s", fn);
-		return (0);
+		return (false);
 	}
 
-	/* Convenience */
-	ln = &pc.ln;
-	pc.ln.file = grep_malloc(strlen(fn) + 1);
-	strcpy(pc.ln.file, fn);
+	pc.ln.file = grep_strdup(fn);
 	pc.ln.line_no = 0;
 	pc.ln.len = 0;
 	pc.ln.boff = 0;
 	pc.ln.off = -1;
 	pc.binary = f->binary;
-	pc.printed = 0;
-	tail = 0;
-	last_outed = 0;
-	same_file = false;
-	doctx = false;
-	printmatch = true;
+	pc.cntlines = false;
+	memset(&mc, 0, sizeof(mc));
+	mc.printmatch = true;
 	if ((pc.binary && binbehave == BINFILE_BIN) || cflag || qflag ||
 	    lflag || Lflag)
-		printmatch = false;
-	if (printmatch && (Aflag != 0 || Bflag != 0))
-		doctx = true;
+		mc.printmatch = false;
+	if (mc.printmatch && (Aflag != 0 || Bflag != 0))
+		mc.doctx = true;
+	if (mc.printmatch && (Aflag != 0 || Bflag != 0 || mflag || nflag))
+		pc.cntlines = true;
 	mcount = mlimit;
 
-	for (c = 0;  c == 0 || !(lflag || qflag); ) {
+	for (lines = 0; lines == 0 || !(lflag || qflag); ) {
+		/*
+		 * XXX TODO: We need to revisit this in a chunking world. We're
+		 * not going to be doing per-line statistics because of the
+		 * overhead involved. procmatches can figure that stuff out as
+		 * needed. */
 		/* Reset per-line statistics */
 		pc.printed = 0;
 		pc.matchidx = 0;
 		pc.lnstart = 0;
 		pc.ln.boff = 0;
 		pc.ln.off += pc.ln.len + 1;
-		if ((pc.ln.dat = grep_fgetln(f, &pc.ln.len)) == NULL ||
+		/* XXX TODO: Grab a chunk */
+		if ((pc.ln.dat = grep_fgetln(f, &pc)) == NULL ||
 		    pc.ln.len == 0)
 			break;
 
@@ -279,59 +358,13 @@ procfile(const char *fn)
 			return (0);
 		}
 
-		if ((t = procline(&pc)) == 0)
-			++c;
+		line_matched = procline(&pc) == !vflag;
+		if (line_matched)
+			++lines;
 
-		/* Deal with any -B context or context separators */
-		if (t == 0 && doctx) {
-			if (!first_match && (!same_file || last_outed > 0))
-				printf("--\n");
-			if (Bflag > 0)
-				printqueue();
-			tail = Aflag;
-		}
-		/* Print the matching line, but only if not quiet/binary */
-		if (t == 0 && printmatch) {
-			printline(&pc, ':');
-			while (pc.matchidx >= MAX_MATCHES) {
-				/* Reset matchidx and try again */
-				pc.matchidx = 0;
-				if (procline(&pc) == 0)
-					printline(&pc, ':');
-				else
-					break;
-			}
-			first_match = false;
-			same_file = true;
-			last_outed = 0;
-		}
-		if (t != 0 && doctx) {
-			/* Deal with any -A context */
-			if (tail > 0) {
-				grep_printline(&pc.ln, '-');
-				tail--;
-				if (Bflag > 0)
-					clearqueue();
-			} else {
-				/*
-				 * Enqueue non-matching lines for -B context.
-				 * If we're not actually doing -B context or if
-				 * the enqueue resulted in a line being rotated
-				 * out, then go ahead and increment last_outed
-				 * to signify a gap between context/match.
-				 */
-				if (Bflag == 0 || (Bflag > 0 && enqueue(ln)))
-					++last_outed;
-			}
-		}
-
-		/* Count the matches if we have a match limit */
-		if (t == 0 && mflag) {
-			--mcount;
-			if (mflag && mcount <= 0)
-				break;
-		}
-
+		/* Halt processing if we hit our match limit */
+		if (!procmatches(&mc, &pc, line_matched))
+			break;
 	}
 	if (Bflag > 0)
 		clearqueue();
@@ -340,19 +373,19 @@ procfile(const char *fn)
 	if (cflag) {
 		if (!hflag)
 			printf("%s:", pc.ln.file);
-		printf("%u\n", c);
+		printf("%u\n", lines);
 	}

*** DIFF OUTPUT TRUNCATED AT 1000 LINES ***


More information about the svn-src-stable mailing list