Issue with grep -i (on i386 only?)

Tue Nov 3 22:14:50 UTC 2009

On Tuesday 03 November 2009 22:19:05 Gabor Kovesdan wrote:
> Mel Flynn escribió:
> > Hi,
> >
> > attached a little test script for grep's -i performance. I tried a few
> > different machines and the 64-bit 7.2 machine I could steal doesn't seem
> > to be affected and out performs pcregrep.
> 
> Note, that pcregrep isn't POSIX regex so it's not a good base of
> comparison. PCRE provides a POSIX-compliant interface to deal with
> Perl-compatible regex for those, who are already familiar with the
> former but it's still Perl regex and not POSIX! That's why some people
> get confused when PCRE comes to the topic.

I realize this, but for the case in question it does not matter. Both 
'regexes' should do the same in PCRE and POSIX. I provided the comparison to 
show that the 'problem of case insensitive comparison' is solvable, at the 
very least for the simple case.

> > On i386 machines, grep -i is significantly slower:
> > i386, 7.2-STABLE of Sep 8, load averages: 0.00, 0.02, 0.00,
> > Mem: 336M Active, 442M Inact, 217M Wired, 38M Cache, 112M Buf, 198M Free
> > dev.cpu.0.freq: 2992 (Intel P-IV HTT enabled)
> > 16Meg file result:
> > =>>> 16777216
> >     =>>> fgrep
> >         0.04 real         0.02 user         0.01 sys
> >         0.04 real         0.03 user         0.01 sys
> >     =>>> pcregrep
> >         0.21 real         0.19 user         0.02 sys
> >         0.21 real         0.20 user         0.00 sys
> >     =>>> grep
> >         0.04 real         0.02 user         0.01 sys << not -i
> >         3.64 real         3.61 user         0.01 sys << -i
> 
> It's an interesting observation, I have never heard of this.
> 
> > So it looks to me that, while there is a problem with case insensitive
> > comparison, just rewriting the expression is an optimization grep could
> > perform.
> > Either way, with the new text tools being written (done?) is this problem
> > being attacked, not fixable due to specifications or not considered an
> > issue? Any PR's needed / I missed? Patches to try?
> >
> > [And it just occured to me bsdgrep is in ports]:
> >     =>>> bsdgrep
> >         0.93 real         0.74 user         0.00 sys
> >         4.80 real         4.33 user         0.02 sys
> >         4.97 real         4.34 user         0.01 sys
> >
> > So here the optimization does not fly.
> 
> Unfortunately, this is the most important issue with BSDL texttools. In
> the grep case, the BSDL version is ready and feature-complete but the
> performance isn't quite satisfying. The main reason of this is GNU grep
> uses a lot of shortcuts, which results in a bloated code (8000 LOC),
> while BSDL grep keeps everything simple and straightforward (1500 LOC).
> IMO, the desired solution would be to keep grep small and get a modern
> regex library for FreeBSD, which performs well. Pushing regex
> optimizations into grep is a bad idea because it not just makes the code
> bloated but other regex users won't benefit from the optimization so the
> problem should be fixed at its roots. And the current regex library we
> have is old, slow and doesn't support wchar, at all.

With this kind of difference, I don't really care who performs the 
optimization, but it seems that multiple options at the same character spot is 
not handled very well, with an extra penalty for "case insensitive".
Why this isn't present on my 64-bit machine is a bit of a mystery to me, but 
since almost no time is spent in sys, I can't blame it on kernel.

> Btw, do you mind if I include your script into the BSD grep
> distribution? I already planned to write something like this for future
> testing.

Consider it public domain.
-- 
Mel