[PATCH] Fix cvsweb.cgi to grok logs pasted into logs

Anton Berezin tobez at freebsd.org
Wed Jun 7 07:05:34 PDT 2006


It is a well-known fact that cvsweb is unable to cope with cvs logs
pasted into log messages.  It is not uncommon to see comments like this
in FreeBSD mailing lists:

   Hmm, you should not quote cvs history in your cvs commit messages. 
   It confuses the tools like CVSweb, which does not show anything below
   the dashed line for your commit.

To me, it sounds like "our tools suck, so don't do that", and I am not
sure I agree with the attitude.  So below is the patch against 3.0.6.

Basically it uses a hack of feeding rlog with -z+00 option, which
happens to modify the dates in the resulting log from "2006/06/05
00:00:35" to "2006-06-05 00:00:35+00".  The resulting output is still
somewhat ambiguous, but this ambiguity is *substantially* less likely to
confuse cvsweb, unless one specially crafts the commit log.

This way of fixing the problem is admittedly going for a low-hanging
fruit, since the proper proper PROPER solution would involve not using
rlog at all and doing all the RCS parsing in-place.

The patched up cvsweb showing FreeBSD repository is currently running
here http://www.tobez.org/cgi-bin/cvsweb.cgi , so that you can see the
difference for yourself, for example:

http://www.freebsd.org/cgi/cvsweb.cgi/ports/devel/p5-Config-Fast/Makefile
and
http://www.tobez.org/cgi-bin/cvsweb.cgi/ports/devel/p5-Config-Fast/Makefile

This message's purpose is two-fold:

- I would like the patch to be incorporated upstream, hence the relevant
  people are Cc'ed;
- I would like the patch to be incorporated into our running cvsweb.

The problem with the later wish is that Simon says [:-)] that we still
run 2.something, and so it would be advisable to first do the update to
3.0.6.  Also, Simon does not have time to do that atm, hence the mail to
the list.

So, what do you folks think?

Cheers,
\Anton.

--- cvsweb.cgi.orig	Wed Jun  7 11:57:23 2006
+++ cvsweb.cgi	Wed Jun  7 13:59:24 2006
@@ -2655,9 +2655,9 @@ sub readLog($;$)
     $revision = defined($revision) ? "-r$revision" : '';
     if ($revision =~ /\./) {
       # Normal revision, not a branch/tag name.
-      exec($CMD{rlog}, $revision, $fullname) or exit -1;
+      exec($CMD{rlog}, "-z+00", $revision, $fullname) or exit -1;
     } else {
-      exec($CMD{rlog}, $fullname) or exit -1;
+      exec($CMD{rlog}, "-z+00", $fullname) or exit -1;
     }
   }
 
@@ -2696,50 +2696,76 @@ sub readLog($;$)
   # becomes smth like
   # revision 9.19       locked by: vassilii;
 
-  logentry:
+  my $state = 'wantrev';
+  my @data;
+  my $data = { bailout => "", log => "" };
 
-  while ($_ !~ LOG_FILESEPR) {
-    $_ = <$fh>;
-    last logentry if (!defined($_));    # EOF
-    if (/^revision (\d+(?:\.\d+)+)/) {
-      $rev = $1;
-      unshift(@allrevisions, $rev);
-    } elsif ($_ =~ LOG_FILESEPR || $_ =~ LOG_REVSEPR) {
-      next logentry;
+  LOGENTRY:
+  while (<$fh>) {
+    if ($state eq 'wantlog') {
+      if ($_ =~ LOG_FILESEPR || $_ =~ LOG_REVSEPR) {
+        push @data, $data if exists $data->{rev};
+        $data  = { bailout => $_, log => "" };
+        $state = 'wantrev';
+      } else {
+        $data->{log} .= $_;
+      }
+    } elsif ($state eq 'wantrev') {
+      if ($_ =~ LOG_FILESEPR || $_ =~ LOG_REVSEPR) {
+        $data->{bailout} .= $_;
+        next LOGENTRY;
+	  }
+      goto BAILOUT unless /^revision (\d+(?:\.\d+)+)/;
+      $data->{rev}      = $1;
+      $data->{bailout} .= $_;
+      $state = 'wantdate';
+    } elsif ($state eq 'wantdate') {
+      if (
+        m|^date:\s+(\d+)-(\d+)-(\d+)\s+(\d+):(\d+):(\d+)\+00;\s+author:\s+(\S+);\s+state:\s+(\S+);\s+(lines:\s+([0-9\s+-]+))?|
+        )
+      {
+        my $yr             = $1;
+        $yr               -= 1900 if ($yr > 100); # Damn 2-digit year routines :-)
+        $data->{date}      = timegm($6, $5, $4, $3, $2 - 1, $yr);
+        $data->{author}    = $7;
+        $data->{state}     = $8;
+        $data->{difflines} = $10;
+        $state             = 'wantbranches';
+      } else {
+        goto BAILOUT;
+      }
+    } elsif ($state eq 'wantbranches') {
+      $state = 'wantlog';
+      if (/^branches:\s/) {
+        next LOGENTRY;
+	  } else {
+        redo LOGENTRY;
+	  }
     } else {
-
-      # The rlog output is syntactically ambiguous.  We must
-      # have guessed wrong about where the end of the last log
-      # message was.
-      # Since this is likely to happen when people put rlog output
-      # in their commit messages, don't even bother keeping
-      # these lines since we don't know what revision they go with
-      # any more.
-      next logentry;
+      fatal("500 Internal Error", 'Wrong state during RCS output parsing: %s', $_);
     }
-    $_ = <$fh>;
-    if (
-      m|^date:\s+(\d+)/(\d+)/(\d+)\s+(\d+):(\d+):(\d+);\s+author:\s+(\S+);\s+state:\s+(\S+);\s+(lines:\s+([0-9\s+-]+))?|
-      )
-    {
-      my $yr           = $1;
-      $yr             -= 1900 if ($yr > 100); # Damn 2-digit year routines :-)
-      $date{$rev}      = timegm($6, $5, $4, $3, $2 - 1, $yr);
-      $author{$rev}    = $7;
-      $state{$rev}     = $8;
-      $difflines{$rev} = $10;
+    next LOGENTRY;
+    BAILOUT:
+    if (@data) {
+      # bailout, pasted log entry detected
+      $data[-1]->{log} .= "$data->{bailout}$_";
+      $data             = pop @data;
+      $state            = 'wantlog';
     } else {
       fatal("500 Internal Error", 'Error parsing RCS output: %s', $_);
     }
-
-  line:
-    while (<$fh>) {
-      next line if (/^branches:\s/);
-      last line if ($_ =~ LOG_FILESEPR || $_ =~ LOG_REVSEPR);
-      $log{$rev} .= $_;
-    }
   }
   close($fh);
+
+  # postprocess
+  for $data (@data) {
+    unshift @allrevisions, $data->{rev};
+    $date{$data->{rev}}      = $data->{date};
+    $author{$data->{rev}}    = $data->{author};
+    $state{$data->{rev}}     = $data->{state};
+    $difflines{$data->{rev}} = $data->{difflines};
+    $log{$data->{rev}}       = $data->{log};
+  }
 
   @revorder = reverse sort { revcmp($a, $b) } @allrevisions;
 

-- 
An undefined problem has an infinite number of solutions.
-- Robert A. Humphrey


More information about the freebsd-www mailing list