regex status report #4
Gabor Kovesdan
gabor at kovesdan.org
Sun Jun 19 14:56:59 UTC 2011
Hi,
this week I tested more the code and opposed to my earlier impressions,
I noticed that the performance is actually varying. With sed it usually
performs like the old code or in some cases it was even significantly
better. It seems that grep is just an extreme case that is very sensible
to performance. So I decided to clean the stuff that I have so far and
publish a patch for testing. If people find the out of the box
performance good enough, we can proceed with the first phase of
replacing the regex code. It has to be tested and checked thoroughly,
though, that's why I want to provide a patch as soon as possible. And
grep will still use the GNU regex code so it's performance will not be
affected. The patch will be ready soon.
Apart from this, I've been looking at how to optimize the performance.
There are a couple of ideas that could possibly work: simple matcher for
fix and simple expressions; optimizing the internals of the code,
wrapping with a heuristical matcher that isolates the possibly matching
part and only applies the heavier algorithm on the narrower context,
etc. I have to think which techniques should be used with TRE and then
implement them. I haven't written any optimization code yet because
first I want to see clearly how it should be done.
Gabor
More information about the soc-status
mailing list