why GNU grep is fast

Dag-Erling Smørgrav des at des.no
Sun Aug 22 11:54:40 UTC 2010


Mike Haertel <mike at ducky.net> writes:
> GNU grep uses the well-known Boyer-Moore algorithm, which looks
> first for the final letter of the target string, and uses a lookup
> table to tell it how far ahead it can skip in the input whenever
> it finds a non-matching character.

Boyer-Moore is for fixed search strings.  I don't see how that
optimization can work with a regexp search unless the regexp is so
simple that you break it down into a small number of cases with known
length and final character.

> GNU grep uses raw Unix input system calls and avoids copying data
> after reading it.

Yes, that was the first thing we looked at (and fixed) in BSD grep.

> Moreover, GNU grep AVOIDS BREAKING THE INPUT INTO LINES.  Looking
> for newlines would slow grep down by a factor of several times,
> because to find the newlines it would have to look at every byte!

Amen.  The current bottleneck in BSD grep is the memchr() that looks for
'\n' in the input buffer.

DES
-- 
Dag-Erling Smørgrav - des at des.no


More information about the freebsd-current mailing list