Issue with grep -i (on i386 only?)

Tue Nov 3 21:19:14 UTC 2009

Mel Flynn escribió:
> Hi,
>
> attached a little test script for grep's -i performance. I tried a few 
> different machines and the 64-bit 7.2 machine I could steal doesn't seem to be 
> affected and out performs pcregrep.
>   
Note, that pcregrep isn't POSIX regex so it's not a good base of 
comparison. PCRE provides a POSIX-compliant interface to deal with 
Perl-compatible regex for those, who are already familiar with the 
former but it's still Perl regex and not POSIX! That's why some people 
get confused when PCRE comes to the topic.
> On i386 machines, grep -i is significantly slower:
> i386, 7.2-STABLE of Sep 8, load averages: 0.00, 0.02, 0.00,
> Mem: 336M Active, 442M Inact, 217M Wired, 38M Cache, 112M Buf, 198M Free
> dev.cpu.0.freq: 2992 (Intel P-IV HTT enabled)
> 16Meg file result:
> =>>> 16777216
>     =>>> fgrep
>         0.04 real         0.02 user         0.01 sys
>         0.04 real         0.03 user         0.01 sys
>     =>>> pcregrep
>         0.21 real         0.19 user         0.02 sys
>         0.21 real         0.20 user         0.00 sys
>     =>>> grep
>         0.04 real         0.02 user         0.01 sys << not -i
>         3.64 real         3.61 user         0.01 sys << -i
>   
It's an interesting observation, I have never heard of this.
> So it looks to me that, while there is a problem with case insensitive 
> comparison, just rewriting the expression is an optimization grep could 
> perform.
> Either way, with the new text tools being written (done?) is this problem 
> being attacked, not fixable due to specifications or not considered an issue?
> Any PR's needed / I missed? Patches to try?
>
> [And it just occured to me bsdgrep is in ports]:
>     =>>> bsdgrep
>         0.93 real         0.74 user         0.00 sys
>         4.80 real         4.33 user         0.02 sys
>         4.97 real         4.34 user         0.01 sys
>
> So here the optimization does not fly.
Unfortunately, this is the most important issue with BSDL texttools. In 
the grep case, the BSDL version is ready and feature-complete but the 
performance isn't quite satisfying. The main reason of this is GNU grep 
uses a lot of shortcuts, which results in a bloated code (8000 LOC), 
while BSDL grep keeps everything simple and straightforward (1500 LOC). 
IMO, the desired solution would be to keep grep small and get a modern 
regex library for FreeBSD, which performs well. Pushing regex 
optimizations into grep is a bad idea because it not just makes the code 
bloated but other regex users won't benefit from the optimization so the 
problem should be fixed at its roots. And the current regex library we 
have is old, slow and doesn't support wchar, at all.

Btw, do you mind if I include your script into the BSD grep 
distribution? I already planned to write something like this for future 
testing.

-- 
Gabor Kovesdan
FreeBSD Volunteer

EMAIL: gabor at FreeBSD.org .:|:. gabor at kovesdan.org
WEB:   http://people.FreeBSD.org/~gabor .:|:. http://kovesdan.org