svn commit: r210389 - in head: . share/mk tools/build/options usr.bin usr.bin/grep usr.bin/grep/nls

Thu Jul 22 19:11:58 UTC 2010

Author: gabor
Date: Thu Jul 22 19:11:57 2010
New Revision: 210389
URL: http://svn.freebsd.org/changeset/base/210389

Log:
  Add BSD grep to the base system and make it our default grep.
  
  Deliverables: Small and clean code (1,4 KSLOC vs GNU's 8,5 KSLOC),
                lower memory usage than GNU grep, GNU compatibility,
                BSD license.
  
  TODO:         Performance is somewhat behind GNU grep but it is only
                significant for bigger searches.  The reason is complex, the
                most important factor is that GNU grep uses lots of
                optimizations to improve the speed of the regex library.
                First, we need a modern regex library (practically by adopting
                TRE), add support for GNU-style non-standard regexes and then
                reevalute the performance issues and look for bottlenecks.  In
                the meantime, for those, who need better performance, it is
                possible to build GNU grep by setting WITH_GNU_GREP.
  
  Approved by:            delphij (mentor)
  Obtained from:          OpenBSD (http://www.openbsd.org/cgi-bin/cvsweb/src/usr.bin/grep/),
                          freegrep (http://github.com/howardjp/freegrep)
  Sponsored by:           Google SoC 2008
  Portbuild tests run by: kris, pav, erwin
  Acknowledgements to:    fjoe (as SoC 2008 mentor),
                          everyone who helped in reviewing and testing

Added:
  head/tools/build/options/WITH_GNU_GREP   (contents, props changed)
  head/usr.bin/grep/
  head/usr.bin/grep/Makefile   (contents, props changed)
  head/usr.bin/grep/fastgrep.c   (contents, props changed)
  head/usr.bin/grep/file.c   (contents, props changed)
  head/usr.bin/grep/grep.1   (contents, props changed)
  head/usr.bin/grep/grep.c   (contents, props changed)
  head/usr.bin/grep/grep.h   (contents, props changed)
  head/usr.bin/grep/nls/
  head/usr.bin/grep/nls/C.msg   (contents, props changed)
  head/usr.bin/grep/nls/Makefile.inc   (contents, props changed)
  head/usr.bin/grep/nls/es_ES.ISO8859-1.msg   (contents, props changed)
  head/usr.bin/grep/nls/gl_ES.ISO8859-1.msg   (contents, props changed)
  head/usr.bin/grep/nls/hu_HU.ISO8859-2.msg   (contents, props changed)
  head/usr.bin/grep/nls/pt_BR.ISO8859-1.msg   (contents, props changed)
  head/usr.bin/grep/queue.c   (contents, props changed)
  head/usr.bin/grep/util.c   (contents, props changed)
Deleted:
  head/tools/build/options/WITHOUT_GNU_GREP
Modified:
  head/UPDATING
  head/share/mk/bsd.own.mk
  head/usr.bin/Makefile

Modified: head/UPDATING
==============================================================================

--- head/UPDATING	Thu Jul 22 19:09:34 2010	(r210388)
+++ head/UPDATING	Thu Jul 22 19:11:57 2010	(r210389)
@@ -22,6 +22,18 @@ NOTE TO PEOPLE WHO THINK THAT FreeBSD 9.
 	machines to maximize performance.  (To disable malloc debugging, run
 	ln -s aj /etc/malloc.conf.)
 
+20100722:
+	BSD grep has been imported to the base system and it is built by
+	default.  It is completely BSD licensed, highly GNU-compatible, uses
+	less memory than its GNU counterpart and has a small codebase.
+	However, it is slower than its GNU counterpart, which is mostly
+	noticeable for larger searches, for smaller ones it is measurable
+	but not significant.  The reason is complex, the most important factor
+	is that we lack a modern and efficient regex library and GNU
+	overcomes this by optimizing the searches internally.  Future work
+	on improving the regex performance is planned, for the meantime,
+	users that need better performance, can build GNU grep instead by
+	setting the WITH_GNU_GREP knob.
 
 20100713:
 	Due to the import of powerpc64 support, all existing powerpc kernel

Modified: head/share/mk/bsd.own.mk
==============================================================================
--- head/share/mk/bsd.own.mk	Thu Jul 22 19:09:34 2010	(r210388)
+++ head/share/mk/bsd.own.mk	Thu Jul 22 19:11:57 2010	(r210389)
@@ -334,7 +334,6 @@ _clang_no=CLANG
     GCOV \
     GDB \
     GNU \
-    GNU_GREP \
     GPIB \
     GROFF \
     HTML \
@@ -422,6 +421,7 @@ MK_${var}:=	yes
     BIND_XML \
     ${_clang_no} \
     FDT \
+    GNU_GREP \
     HESIOD \
     IDEA
 .if defined(WITH_${var}) && defined(WITHOUT_${var})

Added: head/tools/build/options/WITH_GNU_GREP
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/tools/build/options/WITH_GNU_GREP	Thu Jul 22 19:11:57 2010	(r210389)
@@ -0,0 +1,2 @@
+.\" $FreeBSD$
+Set to build the base system with GNU grep instead of BSD grep

Modified: head/usr.bin/Makefile
==============================================================================
--- head/usr.bin/Makefile	Thu Jul 22 19:09:34 2010	(r210388)
+++ head/usr.bin/Makefile	Thu Jul 22 19:11:57 2010	(r210389)
@@ -79,6 +79,7 @@ SUBDIR=	alias \
 	getent \
 	getopt \
 	${_gprof} \
+	${_grep} \
 	gzip \
 	head \
 	${_hesinfo} \
@@ -284,6 +285,10 @@ _calendar=	calendar
 _clang=		clang
 .endif
 
+.if ${MK_GNU_GREP} != "yes"
+_grep=		grep
+.endif
+
 .if ${MK_HESIOD} != "no"
 _hesinfo=	hesinfo
 .endif

Added: head/usr.bin/grep/Makefile
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/usr.bin/grep/Makefile	Thu Jul 22 19:11:57 2010	(r210389)
@@ -0,0 +1,35 @@
+#	$FreeBSD$
+#	$OpenBSD: Makefile,v 1.6 2003/06/25 15:00:04 millert Exp $
+
+PROG=	grep
+SRCS=	fastgrep.c file.c grep.c queue.c util.c
+LINKS=	${BINDIR}/grep ${BINDIR}/egrep \
+	${BINDIR}/grep ${BINDIR}/fgrep \
+	${BINDIR}/grep ${BINDIR}/zgrep \
+	${BINDIR}/grep ${BINDIR}/zegrep \
+	${BINDIR}/grep ${BINDIR}/zfgrep \
+
+MLINKS= grep.1 egrep.1 \
+	grep.1 fgrep.1 \
+	grep.1 zgrep.1 \
+	grep.1 zegrep.1 \
+	grep.1 zfgrep.1
+
+WARNS?=	6
+
+LDADD=	-lz -lbz2
+DPADD=	${LIBZ} ${LIBBZ2}
+
+.if !defined(WITHOUT_GNU_COMPAT)
+CFLAGS+= -I/usr/include/gnu
+LDADD+=	-lgnuregex
+DPADD+=	${LIBGNUREGEX}
+.endif
+
+.if !defined(WITHOUT_NLS)
+.include "${.CURDIR}/nls/Makefile.inc"
+.else
+CFLAGS+= -DWITHOUT_NLS
+.endif
+
+.include <bsd.prog.mk>

Added: head/usr.bin/grep/fastgrep.c
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/usr.bin/grep/fastgrep.c	Thu Jul 22 19:11:57 2010	(r210389)
@@ -0,0 +1,333 @@
+/*	$OpenBSD: util.c,v 1.36 2007/10/02 17:59:18 otto Exp $	*/
+
+/*-
+ * Copyright (c) 1999 James Howard and Dag-Erling Coïdan Smørgrav
+ * Copyright (C) 2008 Gabor Kovesdan <gabor at FreeBSD.org>
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+/*
+ * XXX: This file is a speed up for grep to cover the defects of the
+ * regex library.  These optimizations should practically be implemented
+ * there keeping this code clean.  This is a future TODO, but for the
+ * meantime, we need to use this workaround.
+ */
+
+#include <sys/cdefs.h>
+__FBSDID("$FreeBSD$");
+
+#include <limits.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <string.h>
+#include <wchar.h>
+#include <wctype.h>
+
+#include "grep.h"
+
+static int	grep_cmp(const unsigned char *, const unsigned char *, size_t);
+static void	grep_revstr(unsigned char *, int);
+
+void
+fgrepcomp(fastgrep_t *fg, const char *pat)
+{
+	unsigned int i;
+
+	/* Initialize. */
+	fg->len = strlen(pat);
+	fg->bol = false;
+	fg->eol = false;
+	fg->reversed = false;
+
+	fg->pattern = grep_malloc(strlen(pat) + 1);
+	strcpy(fg->pattern, pat);
+
+	/* Preprocess pattern. */
+	for (i = 0; i <= UCHAR_MAX; i++)
+		fg->qsBc[i] = fg->len;
+	for (i = 1; i < fg->len; i++)
+		fg->qsBc[fg->pattern[i]] = fg->len - i;
+}
+
+/*
+ * Returns: -1 on failure, 0 on success
+ */
+int
+fastcomp(fastgrep_t *fg, const char *pat)
+{
+	unsigned int i;
+	int firstHalfDot = -1;
+	int firstLastHalfDot = -1;
+	int hasDot = 0;
+	int lastHalfDot = 0;
+	int shiftPatternLen;
+	bool bol = false;
+	bool eol = false;
+
+	/* Initialize. */
+	fg->len = strlen(pat);
+	fg->bol = false;
+	fg->eol = false;
+	fg->reversed = false;
+
+	/* Remove end-of-line character ('$'). */
+	if (fg->len > 0 && pat[fg->len - 1] == '$') {
+		eol = true;
+		fg->eol = true;
+		fg->len--;
+	}
+
+	/* Remove beginning-of-line character ('^'). */
+	if (pat[0] == '^') {
+		bol = true;
+		fg->bol = true;
+		fg->len--;
+	}
+
+	if (fg->len >= 14 &&
+	    strncmp(pat + (fg->bol ? 1 : 0), "[[:<:]]", 7) == 0 &&
+	    strncmp(pat + (fg->bol ? 1 : 0) + fg->len - 7, "[[:>:]]", 7) == 0) {
+		fg->len -= 14;
+		/* Word boundary is handled separately in util.c */
+		wflag = true;
+	}
+
+	/*
+	 * Copy pattern minus '^' and '$' characters as well as word
+	 * match character classes at the beginning and ending of the
+	 * string respectively.
+	 */
+	fg->pattern = grep_malloc(fg->len + 1);
+	memcpy(fg->pattern, pat + (bol ? 1 : 0) + wflag, fg->len);
+	fg->pattern[fg->len] = '\0';
+
+	/* Look for ways to cheat...er...avoid the full regex engine. */
+	for (i = 0; i < fg->len; i++) {
+		/* Can still cheat? */
+		if (fg->pattern[i] == '.') {
+			hasDot = i;
+			if (i < fg->len / 2) {
+				if (firstHalfDot < 0)
+					/* Closest dot to the beginning */
+					firstHalfDot = i;
+			} else {
+				/* Closest dot to the end of the pattern. */
+				lastHalfDot = i;
+				if (firstLastHalfDot < 0)
+					firstLastHalfDot = i;
+			}
+		} else {
+			/* Free memory and let others know this is empty. */
+			free(fg->pattern);
+			fg->pattern = NULL;
+			return (-1);
+		}
+	}
+
+	/*
+	 * Determine if a reverse search would be faster based on the placement
+	 * of the dots.
+	 */
+	if ((!(lflag || cflag)) && ((!(bol || eol)) &&
+	    ((lastHalfDot) && ((firstHalfDot < 0) ||
+	    ((fg->len - (lastHalfDot + 1)) < (size_t)firstHalfDot)))) &&
+	    !oflag && !color) {
+		fg->reversed = true;
+		hasDot = fg->len - (firstHalfDot < 0 ?
+		    firstLastHalfDot : firstHalfDot) - 1;
+		grep_revstr(fg->pattern, fg->len);
+	}
+
+	/*
+	 * Normal Quick Search would require a shift based on the position the
+	 * next character after the comparison is within the pattern.  With
+	 * wildcards, the position of the last dot effects the maximum shift
+	 * distance.
+	 * The closer to the end the wild card is the slower the search.  A
+	 * reverse version of this algorithm would be useful for wildcards near
+	 * the end of the string.
+	 *
+	 * Examples:
+	 * Pattern	Max shift
+	 * -------	---------
+	 * this		5
+	 * .his		4
+	 * t.is		3
+	 * th.s		2
+	 * thi.		1
+	 */
+
+	/* Adjust the shift based on location of the last dot ('.'). */
+	shiftPatternLen = fg->len - hasDot;
+
+	/* Preprocess pattern. */
+	for (i = 0; i <= (signed)UCHAR_MAX; i++)
+		fg->qsBc[i] = shiftPatternLen;
+	for (i = hasDot + 1; i < fg->len; i++) {
+		fg->qsBc[fg->pattern[i]] = fg->len - i;
+	}
+
+	/*
+	 * Put pattern back to normal after pre-processing to allow for easy
+	 * comparisons later.
+	 */
+	if (fg->reversed)
+		grep_revstr(fg->pattern, fg->len);
+
+	return (0);
+}
+
+int
+grep_search(fastgrep_t *fg, unsigned char *data, size_t len, regmatch_t *pmatch)
+{
+	unsigned int j;
+	int ret = REG_NOMATCH;
+
+	if (pmatch->rm_so == (ssize_t)len)
+		return (ret);
+
+	if (fg->bol && pmatch->rm_so != 0) {
+		pmatch->rm_so = len;
+		pmatch->rm_eo = len;
+		return (ret);
+	}
+
+	/* No point in going farther if we do not have enough data. */
+	if (len < fg->len)
+		return (ret);
+
+	/* Only try once at the beginning or ending of the line. */
+	if (fg->bol || fg->eol) {
+		/* Simple text comparison. */
+		/* Verify data is >= pattern length before searching on it. */
+		if (len >= fg->len) {
+			/* Determine where in data to start search at. */
+			j = fg->eol ? len - fg->len : 0;
+			if (!((fg->bol && fg->eol) && (len != fg->len)))
+				if (grep_cmp(fg->pattern, data + j,
+				    fg->len) == -1) {
+					pmatch->rm_so = j;
+					pmatch->rm_eo = j + fg->len;
+						ret = 0;
+				}
+		}
+	} else if (fg->reversed) {
+		/* Quick Search algorithm. */
+		j = len;
+		do {
+			if (grep_cmp(fg->pattern, data + j - fg->len,
+				fg->len) == -1) {
+				pmatch->rm_so = j - fg->len;
+				pmatch->rm_eo = j;
+				ret = 0;
+				break;
+			}
+			/* Shift if within bounds, otherwise, we are done. */
+			if (j == fg->len)
+				break;
+			j -= fg->qsBc[data[j - fg->len - 1]];
+		} while (j >= fg->len);
+	} else {
+		/* Quick Search algorithm. */
+		j = pmatch->rm_so;
+		do {
+			if (grep_cmp(fg->pattern, data + j, fg->len) == -1) {
+				pmatch->rm_so = j;
+				pmatch->rm_eo = j + fg->len;
+				ret = 0;
+				break;
+			}
+
+			/* Shift if within bounds, otherwise, we are done. */
+			if (j + fg->len == len)
+				break;
+			else
+				j += fg->qsBc[data[j + fg->len]];
+		} while (j <= (len - fg->len));
+	}
+
+	return (ret);
+}
+
+/*
+ * Returns:	i >= 0 on failure (position that it failed)
+ *		-1 on success
+ */
+static int
+grep_cmp(const unsigned char *pat, const unsigned char *data, size_t len)
+{
+	size_t size;
+	wchar_t *wdata, *wpat;
+	unsigned int i;
+
+	if (iflag) {
+		if ((size = mbstowcs(NULL, (const char *)data, 0)) ==
+		    ((size_t) - 1))
+			return (-1);
+
+		wdata = grep_malloc(size * sizeof(wint_t));
+
+		if (mbstowcs(wdata, (const char *)data, size) ==
+		    ((size_t) - 1))
+			return (-1);
+
+		if ((size = mbstowcs(NULL, (const char *)pat, 0)) ==
+		    ((size_t) - 1))
+			return (-1);
+
+		wpat = grep_malloc(size * sizeof(wint_t));
+
+		if (mbstowcs(wpat, (const char *)pat, size) == ((size_t) - 1))
+			return (-1);
+		for (i = 0; i < len; i++) {
+			if ((towlower(wpat[i]) == towlower(wdata[i])) ||
+			    ((grepbehave != GREP_FIXED) && wpat[i] == L'.'))
+				continue;
+			free(wpat);
+			free(wdata);
+				return (i);
+		}
+	} else {
+		for (i = 0; i < len; i++) {
+			if ((pat[i] == data[i]) || ((grepbehave != GREP_FIXED) &&
+			    pat[i] == '.'))
+				continue;
+			return (i);
+		}
+	}
+	return (-1);
+}
+
+static void
+grep_revstr(unsigned char *str, int len)
+{
+	int i;
+	char c;
+
+	for (i = 0; i < len / 2; i++) {
+		c = str[i];
+		str[i] = str[len - i - 1];
+		str[len - i - 1] = c;
+	}
+}

Added: head/usr.bin/grep/file.c
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/usr.bin/grep/file.c	Thu Jul 22 19:11:57 2010	(r210389)
@@ -0,0 +1,255 @@
+/*	$OpenBSD: file.c,v 1.11 2010/07/02 20:48:48 nicm Exp $	*/
+
+/*-
+ * Copyright (c) 1999 James Howard and Dag-Erling Coïdan Smørgrav
+ * Copyright (C) 2008-2009 Gabor Kovesdan <gabor at FreeBSD.org>
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#include <sys/cdefs.h>
+__FBSDID("$FreeBSD$");
+
+#include <sys/param.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include <bzlib.h>
+#include <err.h>
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <wchar.h>
+#include <wctype.h>
+#include <zlib.h>
+
+#include "grep.h"
+
+static char	 fname[MAXPATHLEN];	/* file name */
+
+#define		 MAXBUFSIZ	(16 * 1024)
+#define		 PREREAD_M	0.2
+
+/* Some global variables for the buffering and reading. */
+static char	*lnbuf;
+static size_t	 lnbuflen;
+static unsigned char *binbuf;
+static int	 binbufsiz;
+unsigned char	*binbufptr;
+static int	 bzerr;
+
+#define iswbinary(ch)	(!iswspace((ch)) && iswcntrl((ch)) && \
+			    (ch != L'\b') && (ch != L'\0'))
+
+/*
+ * Returns a single character according to the file type.
+ * Returns -1 on failure.
+ */
+int
+grep_fgetc(struct file *f)
+{
+	unsigned char c;
+
+	switch (filebehave) {
+	case FILE_STDIO:
+		return (fgetc(f->f));
+	case FILE_GZIP:
+		return (gzgetc(f->gzf));
+	case FILE_BZIP:
+		BZ2_bzRead(&bzerr, f->bzf, &c, 1);
+		if (bzerr == BZ_STREAM_END)
+			return (-1);
+		else if (bzerr != BZ_SEQUENCE_ERROR && bzerr != BZ_OK)
+			errx(2, "%s", getstr(2));
+		return (c);
+	}
+	return (-1);
+}
+
+/*
+ * Returns true if the file position is a EOF, returns false
+ * otherwise.
+ */
+int
+grep_feof(struct file *f)
+{
+
+	switch (filebehave) {
+	case FILE_STDIO:
+		return (feof(f->f));
+	case FILE_GZIP:
+		return (gzeof(f->gzf));
+	case FILE_BZIP:
+		return (bzerr == BZ_STREAM_END);
+	}
+	return (1);
+}
+
+/*
+ * At the first call, fills in an internal buffer and checks if the given
+ * file is a binary file and sets the binary flag accordingly.  Then returns
+ * a single line and sets len to the length of the returned line.
+ * At any other call returns a single line either from the internal buffer
+ * or from the file if the buffer is exhausted and sets len to the length
+ * of the line.
+ */
+char *
+grep_fgetln(struct file *f, size_t *len)
+{
+	struct stat st;
+	size_t bufsiz, i = 0;
+	int ch = 0;
+
+	/* Fill in the buffer if it is empty. */
+	if (binbufptr == NULL) {
+
+		/* Only pre-read to the buffer if we need the binary check. */
+		if (binbehave != BINFILE_TEXT) {
+			if (f->stdin)
+				st.st_size = MAXBUFSIZ;
+			else if (stat(fname, &st) != 0)
+				err(2, NULL);
+
+			bufsiz = (MAXBUFSIZ > (st.st_size * PREREAD_M)) ?
+			    (st.st_size / 2) : MAXBUFSIZ;
+
+			binbuf = grep_malloc(sizeof(char) * bufsiz);
+
+			while (i < bufsiz) {
+				ch = grep_fgetc(f);
+				if (ch == EOF)
+					break;
+				binbuf[i++] = ch;
+			}
+
+			f->binary = memchr(binbuf, (filebehave != FILE_GZIP) ?
+			    '\0' : '\200', i - 1) != NULL;
+		}
+		binbufsiz = i;
+		binbufptr = binbuf;
+	}
+
+	/* Read a line whether from the buffer or from the file itself. */
+	for (i = 0; !(grep_feof(f) &&
+	    (binbufptr == &binbuf[binbufsiz])); i++) {
+		if (binbufptr == &binbuf[binbufsiz]) {
+			ch = grep_fgetc(f);
+		} else {
+			ch = binbufptr[0];
+			binbufptr++;
+		}
+		if (i >= lnbuflen) {
+			lnbuflen *= 2;
+			lnbuf = grep_realloc(lnbuf, ++lnbuflen);
+		}
+		if ((ch == '\n') || (ch == EOF)) {
+			lnbuf[i] = '\0';
+			break;
+		} else
+			lnbuf[i] = ch;
+	}
+	if (grep_feof(f) && (i == 0) && (ch != '\n'))
+		return (NULL);
+	*len = i;
+	return (lnbuf);
+}
+
+/*
+ * Opens the standard input for processing.
+ */
+struct file *
+grep_stdin_open(void)
+{
+	struct file *f;
+
+	snprintf(fname, sizeof fname, "%s", getstr(1));
+
+	f = grep_malloc(sizeof *f);
+
+	if ((f->f = fdopen(STDIN_FILENO, "r")) != NULL) {
+		f->stdin = true;
+		return (f);
+	}
+
+	free(f);
+	return (NULL);
+}
+
+/*
+ * Opens a normal, a gzipped or a bzip2 compressed file for processing.
+ */
+struct file *
+grep_open(const char *path)
+{
+	struct file *f;
+
+	snprintf(fname, sizeof fname, "%s", path);
+
+	f = grep_malloc(sizeof *f);
+
+	f->stdin = false;
+	switch (filebehave) {
+	case FILE_STDIO:
+		if ((f->f = fopen(path, "r")) != NULL)
+			return (f);
+		break;
+	case FILE_GZIP:
+		if ((f->gzf = gzopen(fname, "r")) != NULL)
+			return (f);
+		break;
+	case FILE_BZIP:
+		if ((f->bzf = BZ2_bzopen(fname, "r")) != NULL)
+			return (f);
+		break;
+	}
+
+	free(f);
+	return (NULL);
+}
+
+/*
+ * Closes a normal, a gzipped or a bzip2 compressed file.
+ */
+void
+grep_close(struct file *f)
+{
+
+	switch (filebehave) {
+	case FILE_STDIO:
+		fclose(f->f);
+		break;
+	case FILE_GZIP:
+		gzclose(f->gzf);
+		break;
+	case FILE_BZIP:
+		BZ2_bzclose(f->bzf);
+		break;
+	}
+
+	/* Reset read buffer for the file we are closing */
+	binbufptr = NULL;
+	free(binbuf);
+
+}

Added: head/usr.bin/grep/grep.1
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/usr.bin/grep/grep.1	Thu Jul 22 19:11:57 2010	(r210389)
@@ -0,0 +1,461 @@
+.\"	$FreeBSD$
+.\"	$OpenBSD: grep.1,v 1.38 2010/04/05 06:30:59 jmc Exp $
+.\" Copyright (c) 1980, 1990, 1993
+.\"	The Regents of the University of California.  All rights reserved.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\"    notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\"    notice, this list of conditions and the following disclaimer in the
+.\"    documentation and/or other materials provided with the distribution.
+.\" 3. Neither the name of the University nor the names of its contributors
+.\"    may be used to endorse or promote products derived from this software
+.\"    without specific prior written permission.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+.\" SUCH DAMAGE.
+.\"
+.\"	@(#)grep.1	8.3 (Berkeley) 4/18/94
+.\"
+.Dd 19 September, 2009
+.Dt GREP 1
+.Os
+.Sh NAME
+.Nm grep , egrep , fgrep ,
+.Nm zgrep , zegrep , zfgrep
+.Nd file pattern searcher
+.Sh SYNOPSIS
+.Nm grep
+.Bk -words
+.Op Fl abcdDEFGHhIiJLlmnOopqRSsUVvwxZ
+.Op Fl A Ar num
+.Op Fl B Ar num
+.Op Fl C Ns Op Ar num
+.Op Fl e Ar pattern
+.Op Fl f Ar file
+.Op Fl Fl binary-files Ns = Ns Ar value
+.Op Fl Fl color Ns Op = Ns Ar when
+.Op Fl Fl colour Ns Op = Ns Ar when
+.Op Fl Fl context Ns Op = Ns Ar num
+.Op Fl Fl label
+.Op Fl Fl line-buffered
+.Op Fl Fl null
+.Op Ar pattern
+.Op Ar
+.Ek
+.Sh DESCRIPTION
+The
+.Nm grep
+utility searches any given input files,
+selecting lines that match one or more patterns.
+By default, a pattern matches an input line if the regular expression
+(RE) in the pattern matches the input line
+without its trailing newline.
+An empty expression matches every line.
+Each input line that matches at least one of the patterns is written
+to the standard output.
+.Pp
+.Nm grep
+is used for simple patterns and
+basic regular expressions
+.Pq BREs ;
+.Nm egrep
+can handle extended regular expressions
+.Pq EREs .
+See
+.Xr re_format 7
+for more information on regular expressions.
+.Nm fgrep
+is quicker than both
+.Nm grep
+and
+.Nm egrep ,
+but can only handle fixed patterns
+(i.e. it does not interpret regular expressions).
+Patterns may consist of one or more lines,
+allowing any of the pattern lines to match a portion of the input.
+.Pp
+.Nm zgrep ,
+.Nm zegrep ,
+and
+.Nm zfgrep
+act like
+.Nm grep ,
+.Nm egrep ,
+and
+.Nm fgrep ,
+respectively, but accept input files compressed with the
+.Xr compress 1
+or
+.Xr gzip 1
+compression utilities.
+.Pp
+The following options are available:
+.Bl -tag -width indent
+.It Fl A Ar num , Fl Fl after-context Ns = Ns Ar num
+Print
+.Ar num
+lines of trailing context after each match.
+See also the
+.Fl B
+and
+.Fl C
+options.
+.It Fl a , Fl Fl text
+Treat all files as ASCII text.
+Normally
+.Nm
+will simply print
+.Dq Binary file ... matches
+if files contain binary characters.
+Use of this option forces
+.Nm
+to output lines matching the specified pattern.
+.It Fl B Ar num , Fl Fl before-context Ns = Ns Ar num
+Print
+.Ar num
+lines of leading context before each match.
+See also the
+.Fl A
+and
+.Fl C
+options.
+.It Fl b , Fl Fl byte-offset
+The offset in bytes of a matched pattern is
+displayed in front of the respective matched line.
+.It Fl C Ns Op Ar num , Fl Fl context Ns = Ns Ar num
+Print
+.Ar num
+lines of leading and trailing context surrounding each match.
+The default is 2 and is equivalent to
+.Fl A
+.Ar 2
+.Fl B
+.Ar 2 .
+Note:
+no whitespace may be given between the option and its argument.
+.It Fl c , Fl Fl count
+Only a count of selected lines is written to standard output.
+.It Fl Fl colour Ns = Ns Op Ar when , Fl Fl color Ns = Ns Op Ar when
+Mark up the matching text with the expression stored in
+.Ev GREP_COLOR
+environment variable.
+The possible values of when can be `never', `always' or `auto'.
+.It Fl D Ar action , Fl Fl devices Ns = Ns Ar action
+Specify the demanded action for devices, FIFOs and sockets.
+The default action is `read', which means, that they are read
+as if they were normal files.
+If the action is set to `skip', devices will be silently skipped.
+.It Fl d Ar action , Fl Fl directories Ns = Ns Ar action
+Specify the demanded action for directories.
+It is `read' by default, which means that the directories
+are read in the same manner as normal files.
+Other possible values are `skip' to silently ignore the
+directories, and `recurse' to read them recursively, which
+has the same effect as the
+.Fl R
+and
+.Fl r
+option.
+.It Fl E , Fl Fl extended-regexp
+Interpret
+.Ar pattern
+as an extended regular expression
+(i.e. force
+.Nm grep
+to behave as
+.Nm egrep ) .
+.It Fl e Ar pattern , Fl Fl regexp Ns = Ns Ar pattern
+Specify a pattern used during the search of the input:
+an input line is selected if it matches any of the specified patterns.
+This option is most useful when multiple
+.Fl e
+options are used to specify multiple patterns,
+or when a pattern begins with a dash
+.Pq Sq - .
+.It Fl Fl exclude
+If
+.Fl R
+is specified, it excludes files matching the given
+filename pattern.
+.It Fl Fl exclude-dir
+If
+.Fl R
+is specified, it excludes directories matching the
+given filename pattern.
+.It Fl F , Fl Fl fixed-strings
+Interpret
+.Ar pattern
+as a set of fixed strings
+(i.e. force
+.Nm grep
+to behave as
+.Nm fgrep ) .
+.It Fl f Ar file , Fl Fl file Ns = Ns Ar file
+Read one or more newline separated patterns from
+.Ar file .
+Empty pattern lines match every input line.
+Newlines are not considered part of a pattern.
+If
+.Ar file
+is empty, nothing is matched.
+.It Fl G , Fl Fl basic-regexp
+Interpret
+.Ar pattern
+as a basic regular expression
+(i.e. force
+.Nm grep
+to behave as traditional
+.Nm grep ) .
+.It Fl H
+Always print filename headers with output lines.
+.It Fl h , Fl Fl no-filename
+Never print filename headers
+.Pq i.e. filenames
+with output lines.
+.It Fl Fl help
+Print a brief help message.
+.It Fl I
+Ignore binary files.
+This option is equivalent to
+.Fl Fl binary-file Ns = Ns Ar without-match
+option.
+.It Fl i , Fl Fl ignore-case
+Perform case insensitive matching.
+By default,
+.Nm grep
+is case sensitive.
+.It Fl Fl include
+If
+.Fl R
+is specified, it includes the files matching the
+given filename pattern.
+.It Fl Fl include-dir
+If
+.Fl R
+is specified, it includes the directories matching the
+given filename pattern.
+.It Fl J, Fl Fl bz2decompress
+Decompress the
+.Xr bzip2 1
+compressed file before looking for the text.
+.It Fl L , Fl Fl files-without-match
+Only the names of files not containing selected lines are written to
+standard output.
+Pathnames are listed once per file searched.
+If the standard input is searched, the string
+.Dq (standard input)
+is written.
+.It Fl l , Fl Fl files-with-matches
+Only the names of files containing selected lines are written to
+standard output.
+.Nm grep
+will only search a file until a match has been found,
+making searches potentially less expensive.
+Pathnames are listed once per file searched.
+If the standard input is searched, the string
+.Dq (standard input)
+is written.
+.It Fl Fl mmap
+Use
+.Xr mmap 2
+instead of
+.Xr read 2
+to read input, which can result in better performance under some
+circumstances but can cause undefined behaiour.
+.It Fl m Ar num, Fl Fl max-count Ns = Ns Ar num
+Stop reading the file after
+.Ar num
+matches.
+.It Fl n , Fl Fl line-number
+Each output line is preceded by its relative line number in the file,
+starting at line 1.
+The line number counter is reset for each file processed.
+This option is ignored if
+.Fl c ,
+.Fl L ,
+.Fl l ,
+or
+.Fl q
+is
+specified.
+.It Fl Fl null
+Prints a zero-byte after the file name.
+.It Fl O
+If
+.Fl R
+is specified, follow symbolic links only if they were explicitly listed
+on the command line.

*** DIFF OUTPUT TRUNCATED AT 1000 LINES ***