[RFC] Replacing our regex implementation

Bakul Shah bakul at bitblocks.com
Mon May 9 01:49:39 UTC 2011


On Mon, 09 May 2011 02:37:10 BST Gabor Kovesdan <gabor at kovesdan.org>  wrote:
> Em 09-05-2011 02:17, Bakul Shah escreveu:
> > As per the following URLs re2 is much faster than TRE (on the
> > benchmarks they ran):
> >
> > http://lh3lh3.users.sourceforge.net/reb.shtml
> > http://sljit.sourceforge.net/regex_perf.html
> >
> > re2 is in C++&  has a PCRE API, while TRE is in C&  has a
> > POSIX API.  Both have BSD copyright. Is it worth considering
> > making re2 posix compliant?
> Is it wchar-clean and is it actively maintained? C++ is quite 
> anticipated for the base system and I'm not very skilled in it so atm I 
> couldn't promise to use re2 instead of TRE. And anyway, can C++ go into 
> libc? According to POSIX, the regex code has to be there. But let's see 
> what others say... If we happen to use re2 later, my extensions that I 
> talked about in points 2, and 3, would still be useful.
> 
> Anyway, according to some earlier vague measures, TRE seems to be slower 
> in small matching tasks but scales well. These tests seem to compare 
> only short runs with the same regex. It should be seem how they compare 
> e.g. if you grep the whole ports tree with the same pattern. If the 
> matching scales well once the pattern is compiled, that's more important 
> than the overall result for such short tasks, imho.

re2 is certainly maintained. Don't know about whcar cleanliness.
See 
    http://code.google.com/p/re2/
Also check out Russ Cox's excellent articles on implementing it
    http://swtch.com/~rsc/regexp/
and this:
    http://google-opensource.blogspot.com/2010/03/re2-principled-approach-to-regular.html

C++ may be an impediment for it to go into libc but one can
certainly put a C interface on a C++ library.


More information about the freebsd-hackers mailing list