[RFC] Replacing our regex implementation
Bakul Shah
bakul at bitblocks.com
Mon May 9 01:49:39 UTC 2011
On Mon, 09 May 2011 02:37:10 BST Gabor Kovesdan <gabor at kovesdan.org> wrote:
> Em 09-05-2011 02:17, Bakul Shah escreveu:
> > As per the following URLs re2 is much faster than TRE (on the
> > benchmarks they ran):
> >
> > http://lh3lh3.users.sourceforge.net/reb.shtml
> > http://sljit.sourceforge.net/regex_perf.html
> >
> > re2 is in C++& has a PCRE API, while TRE is in C& has a
> > POSIX API. Both have BSD copyright. Is it worth considering
> > making re2 posix compliant?
> Is it wchar-clean and is it actively maintained? C++ is quite
> anticipated for the base system and I'm not very skilled in it so atm I
> couldn't promise to use re2 instead of TRE. And anyway, can C++ go into
> libc? According to POSIX, the regex code has to be there. But let's see
> what others say... If we happen to use re2 later, my extensions that I
> talked about in points 2, and 3, would still be useful.
>
> Anyway, according to some earlier vague measures, TRE seems to be slower
> in small matching tasks but scales well. These tests seem to compare
> only short runs with the same regex. It should be seem how they compare
> e.g. if you grep the whole ports tree with the same pattern. If the
> matching scales well once the pattern is compiled, that's more important
> than the overall result for such short tasks, imho.
re2 is certainly maintained. Don't know about whcar cleanliness.
See
http://code.google.com/p/re2/
Also check out Russ Cox's excellent articles on implementing it
http://swtch.com/~rsc/regexp/
and this:
http://google-opensource.blogspot.com/2010/03/re2-principled-approach-to-regular.html
C++ may be an impediment for it to go into libc but one can
certainly put a C interface on a C++ library.
More information about the freebsd-hackers
mailing list