why GNU grep is fast
Gabor Kovesdan
gabor at FreeBSD.org
Mon Aug 23 10:23:09 UTC 2010
>
>> Later on, he summarizes some of the existing implementations,
>> including comments about the Plan 9 implementation and his own RE2,
>> both of which efficiently handle international text (which seems to
>> be a major concern of Gabor's).
>
> I believe Gabor is considering TRE for a good replacement regex library.
Yes. Oniguruma is slow, Google RE2 only supports Perl and fgrep syntax
but not standard regex and Plan 9 implementation iirc only supports
fgrep syntax and Unicode but not wchar_t in general.
>
>> The key comment in Mike's GNU grep notes is the one about not
>> breaking into lines. That's simply double-scanning the input;
>> instead, run the matcher over blocks of text and, when it finds a
>> match, work backwards from the match to find the appropriate line
>> beginning. This is efficient because most lines don't match.
>
> I do like the idea.
So do I.
>
> BTW, the fastgrep portion of bsdgrep is my fault/contribution to do a
> faster search bypassing the regex library. :) It certainly was not
> written with any encodings in mind; it was purely ASCII. As I have
> not kept up with it, I do not know if anyone improved it or not.
>
It has been made wchar-compliant.
Gabor
More information about the freebsd-current
mailing list