svn commit: r211463 - head/usr.bin/grep

David Schultz das at FreeBSD.ORG
Mon Nov 29 20:16:08 UTC 2010


On Wed, Aug 18, 2010, Dimitry Andric wrote:
> On 2010-08-18 22:48, mdf at FreeBSD.org wrote:
> >>  - Refactor file reading code to use pure syscalls and an internal buffer
> >>    instead of stdio.  This gives BSD grep a very big performance boost,
> >>    its speed is now almost comparable to GNU grep.
> > 
> > I didn't read all of the details in the profiling mails in the thread,
> > but does this mean that work on stdio would give a performance boost
> > to many apps?  Or is there something specific about how grep(1) is
> > using its input that makes it a horse of a different color?
> 
> Originally, it was reading files 1 character at a time, using fgetc(3),
> the locking version even.  This is usually not the fastest way to read
> a large file with stdio. :)
> 
> If grep did not have to support .gz or .bz2 files, we could just have
> plugged in stdio's fgetln(3).  I tried this approach first on some
> non-compressed files, and it performed much better than fgetc'ing.
> 
> The reading code that was now committed, is basically the same algorithm
> as fgetln() uses internally, but it can handle gzip and bzip2 input too.

The gzip limitations you refer to could perhaps be worked around
with a simple application of funopen(3).  IIRC, the overhead
inherent in using fgetln(3) or getline(3) on reasonably long lines
is very small; if it's not, we should look at ways to improve stdio.

There's still a locking operation and memcpy() that can't really
be avoided with stdio, though. With getline(), you'd be able to
delete most of file.c, but it would never be quite as fast.


More information about the svn-src-all mailing list