[RFC] Replacing our regex implementation

Zhihao Yuan lichray at gmail.com
Mon May 9 03:03:17 UTC 2011


On Sun, May 8, 2011 at 8:49 PM, Bakul Shah <bakul at bitblocks.com> wrote:
> On Mon, 09 May 2011 02:37:10 BST Gabor Kovesdan <gabor at kovesdan.org>  wrote:
>> Em 09-05-2011 02:17, Bakul Shah escreveu:
>> > As per the following URLs re2 is much faster than TRE (on the
>> > benchmarks they ran):
>> >
>> > http://lh3lh3.users.sourceforge.net/reb.shtml
>> > http://sljit.sourceforge.net/regex_perf.html
>> >
>> > re2 is in C++&  has a PCRE API, while TRE is in C&  has a
>> > POSIX API.  Both have BSD copyright. Is it worth considering
>> > making re2 posix compliant?
>> Is it wchar-clean and is it actively maintained? C++ is quite
>> anticipated for the base system and I'm not very skilled in it so atm I
>> couldn't promise to use re2 instead of TRE. And anyway, can C++ go into
>> libc? According to POSIX, the regex code has to be there. But let's see
>> what others say... If we happen to use re2 later, my extensions that I
>> talked about in points 2, and 3, would still be useful.
>>
>> Anyway, according to some earlier vague measures, TRE seems to be slower
>> in small matching tasks but scales well. These tests seem to compare
>> only short runs with the same regex. It should be seem how they compare
>> e.g. if you grep the whole ports tree with the same pattern. If the
>> matching scales well once the pattern is compiled, that's more important
>> than the overall result for such short tasks, imho.
>
> re2 is certainly maintained. Don't know about whcar cleanliness.
> See
>    http://code.google.com/p/re2/
> Also check out Russ Cox's excellent articles on implementing it
>    http://swtch.com/~rsc/regexp/
> and this:
>    http://google-opensource.blogspot.com/2010/03/re2-principled-approach-to-regular.html
>
> C++ may be an impediment for it to go into libc but one can
> certainly put a C interface on a C++ library.

1. This lib accepts many popular grammars (PCRE, POSIX, vim, etc.),
but it does not allow you to change the mode.
http://code.google.com/p/re2/source/browse/re2/re2.h

2. It focuses on speed and features, not stability and standardization.

3. It uses C++. We seldom accepts C++ code in base system, and does
not accept it in libc.

So, as far as I concerned, re2 is good as a re engine in some
applications, but may not fit the requirements for a regex in libc.

> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
>



-- 
Zhihao Yuan
The best way to predict the future is to invent it.


More information about the freebsd-hackers mailing list