regex status report #4

Sun Jun 19 14:56:59 UTC 2011

Hi,

this week I tested more the code and opposed to my earlier impressions, 
I noticed that the performance is actually varying. With sed it usually 
performs like the old code or in some cases it was even significantly 
better. It seems that grep is just an extreme case that is very sensible 
to performance. So I decided to clean the stuff that I have so far and 
publish a patch for testing. If people find the out of the box 
performance good enough, we can proceed with the first phase of 
replacing the regex code. It has to be tested and checked thoroughly, 
though, that's why I want to provide a patch as soon as possible. And 
grep will still use the GNU regex code so it's performance will not be 
affected. The patch will be ready soon.

Apart from this, I've been looking at how to optimize the performance. 
There are a couple of ideas that could possibly work: simple matcher for 
fix and simple expressions; optimizing the internals of the code, 
wrapping with a heuristical matcher that isolates the possibly matching 
part and only applies the heavier algorithm on the narrower context, 
etc. I have to think which techniques should be used with TRE and then 
implement them. I haven't written any optimization code yet because 
first I want to see clearly how it should be done.

Gabor