[PATCH] Fix cvsweb.cgi to grok logs pasted into logs

Jonathan Noack noackjr at alumni.rice.edu
Thu Jun 8 09:03:28 UTC 2006


On 06/07/06 08:23, Anton Berezin wrote:
> Basically it uses a hack of feeding rlog with -z+00 option, which
> happens to modify the dates in the resulting log from "2006/06/05
> 00:00:35" to "2006-06-05 00:00:35+00".  The resulting output is still
> somewhat ambiguous, but this ambiguity is *substantially* less likely to
> confuse cvsweb, unless one specially crafts the commit log.

... and users shouldn't specify "-z+00" because UTC is already the
default in rlog.  Brilliant.  This is an ingenious idea and is better
than any of the hacks I considered.  See my "proper" solution further down.

> This way of fixing the problem is admittedly going for a low-hanging
> fruit, since the proper proper PROPER solution would involve not using
> rlog at all and doing all the RCS parsing in-place.

Actually, doing the RCS parsing in-place is *really* hard to do without
destroying performance.  I tried for about a week before throwing in the
towel.  Consider this:

$ time rlog /home/ncvs/src/UPDATING,v > /dev/null
real    0m0.045s
user    0m0.045s
sys     0m0.000s
$ time perl -e 'open(FILE,"/home/ncvs/src/UPDATING,v");
while(<FILE>){print $_;}' > /dev/null
real    0m0.059s
user    0m0.059s
sys     0m0.000s
$ time perl -e 'open(FILE,"/home/ncvs/src/UPDATING,v");
while(read(FILE,$buffer,4096)){print $buffer;}' > /dev/null
real    0m0.014s
user    0m0.014s
sys     0m0.000s

Just grabbing each line and printing it out in Perl takes longer than it
does for rlog to produce usable output!  The reason for this is that
grabbing a line in Perl invokes the regex engine, which is expensive.  I
tried using a buffer and tokenizing input myself (into either characters
or logical tokens), but parsing the rlog output was twice as fast as my
best effort to do the RCS parsing in-place.  Fixing rlog is a much
better "proper" solution.

> The patched up cvsweb showing FreeBSD repository is currently running
> here http://www.tobez.org/cgi-bin/cvsweb.cgi , so that you can see the
> difference for yourself, for example:
> 
> http://www.freebsd.org/cgi/cvsweb.cgi/ports/devel/p5-Config-Fast/Makefile
> and
> http://www.tobez.org/cgi-bin/cvsweb.cgi/ports/devel/p5-Config-Fast/Makefile

There are two places where CVSweb parses rlog output; you forgot to
update getDirLogs (note that UPDATING is missing):
http://www.tobez.org/cgi-bin/cvsweb.cgi/src/?only_with_tag=RELENG_6_1

> This message's purpose is two-fold:
> 
> - I would like the patch to be incorporated upstream, hence the relevant
>   people are Cc'ed;

I guess you mean me :).  This issue really annoys me and as the new
CVSweb maintainer I'm determined to fix it.  I really like your idea and
I think I will incorporate it into CVSweb.  Right now I am in the middle
of a modularization/rewrite, so it may be a while before it hits the tree.

Also, I am pursuing what I consider the "proper" solution: fixing rlog!
 I worked with the RCS folks to hash out a commit log byte count option
for rlog.  This allows CVSweb to know exactly how long to read for the
commit log, eliminating any ambiguity.

Patches (including my RCS in-place attempts):
http://www.noacks.org/cvsweb/

Test site:
http://www.noacks.org/cgi-test/cvsweb.cgi

> - I would like the patch to be incorporated into our running cvsweb.

I think it would be great if we can update the main site to 3.0.x.  I
hope to work with the www@ folks to make this happen.

-Jonathan

-- 
Jonathan Noack | noackjr at alumni.rice.edu | OpenPGP: 0x991D8195

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 187 bytes
Desc: OpenPGP digital signature
Url : http://lists.freebsd.org/pipermail/freebsd-www/attachments/20060608/9451ad8e/signature.pgp


More information about the freebsd-www mailing list