why GNU grep is fast
Sean C. Farley
scf at FreeBSD.org
Sun Aug 22 16:30:08 UTC 2010
On Sun, 22 Aug 2010, Dag-Erling Smørgrav wrote:
> Mike Haertel <mike at ducky.net> writes:
>> GNU grep uses the well-known Boyer-Moore algorithm, which looks first
>> for the final letter of the target string, and uses a lookup table to
>> tell it how far ahead it can skip in the input whenever it finds a
>> non-matching character.
>
> Boyer-Moore is for fixed search strings. I don't see how that
> optimization can work with a regexp search unless the regexp is so
> simple that you break it down into a small number of cases with known
> length and final character.
When I was working on making FreeGrep faster (years ago), I wrote down a
few notes about possible algorithms, especially those that could be
useful for fgrep functionality. I am just passing these onto the list.
Some algorithms:
1. http://en.wikipedia.org/wiki/Aho-Corasick_string_matching_algorithm
2. http://en.wikipedia.org/wiki/Rabin-Karp_string_search_algorithm
3. GNU fgrep: Commentz-Walter
4. GLIMPSE: http://webglimpse.net/pubs/TR94-17.pdf (Boyer-Moore variant)
Also, this may be a useful book:
http://www.dcc.uchile.cl/~gnavarro/FPMbook/
Sean
--
scf at FreeBSD.org
More information about the freebsd-current
mailing list