why GNU grep is fast

Sun Aug 22 16:30:08 UTC 2010

On Sun, 22 Aug 2010, Dag-Erling Smørgrav wrote:

> Mike Haertel <mike at ducky.net> writes:
>> GNU grep uses the well-known Boyer-Moore algorithm, which looks first 
>> for the final letter of the target string, and uses a lookup table to 
>> tell it how far ahead it can skip in the input whenever it finds a 
>> non-matching character.
>
> Boyer-Moore is for fixed search strings.  I don't see how that 
> optimization can work with a regexp search unless the regexp is so 
> simple that you break it down into a small number of cases with known 
> length and final character.

When I was working on making FreeGrep faster (years ago), I wrote down a 
few notes about possible algorithms, especially those that could be 
useful for fgrep functionality.  I am just passing these onto the list.

Some algorithms:
1. http://en.wikipedia.org/wiki/Aho-Corasick_string_matching_algorithm
2. http://en.wikipedia.org/wiki/Rabin-Karp_string_search_algorithm
3. GNU fgrep:  Commentz-Walter
4. GLIMPSE:  http://webglimpse.net/pubs/TR94-17.pdf (Boyer-Moore variant)

Also, this may be a useful book:
http://www.dcc.uchile.cl/~gnavarro/FPMbook/

Sean
-- 
scf at FreeBSD.org