Subversion/CVS experiment summary

Mon Feb 9 13:56:10 PST 2004

On Monday 09 February 2004 03:06 pm, Stijn Hoop wrote:
> Well, that explains a lot -- for some reason I tested using
> $LastChangedRevision: 7921 $. I'll try with an up-to-date one then.

I was looking through the change history for cvs2svn.py and it seems that the 
0.37 version is almost exactly the same as the 0.35 version.  For some reason 
it looks like they just re-tagged the old version rather than bring in the 
changes from HEAD...

> > One thing that may have made a difference is that so far I've been
> > importing things in chunks rather than trying to do the whole repo at
> > once.
>
> Yes, I was afraid though that commits might have spanned subtrees. But then
> again, even if they did they would just get committed as separate revisions
> to the tree, and I suppose one could live with that.

There probably are some commits that do.  Only reason I did it like that was 
to try to trap failure cases more quickly without having to wait for it to 
get through stage 1 on the whole repo.  My plan has always been to go back 
and try to convert the whole thing when I was sure it would import cleanly 
and had the resources to do it (the fastest CPU machine I have probably 
doesn't have enough disk space right now to handle it).

> > Does the Python version do the same thing?  I didn't think to look at
> > memory usage very closely while it was running :-/
>
> As far as I understood it builds a disk cache instead of using malloc().
> This might explain the slowness :)

Ok, that's consistent with what I saw here.  It looked like it created several 
large temporary bdb databases, but I don't remember any excessive swapping 
going on.

> Yes, that's the idea. You 'just' need a tool that can determine changesets
> from a CVS repository to automate this. See
>
> http://wiki.gnuarch.org/moin.cgi/Arch_20and_20CVS_20in_20the_20same_20tree
>
> but substitute Subversion for arch :)

Makes sense.  I believe you mentioned earlier that post-commit hooks could be 
used for this?  But that of course requires assistance from the repomaster.  
It might also be possible to rig up a script to monitor the cvs-all mailing 
list and get its changesets from there...

> It is, but it does work. Maybe I'll test and see if I can 'port' those
> scripts to Subversion :)

Yes, it does work as long as your users are relatively trusted and you keep 
good backups :).  Still, it would probably be the most painless transition 
path to use that over ssh.

In regards to the speed test: ARGH!  svn dump died on me with this message:
* Dumped revision 18576.
* Dumped revision 18577.
* Dumped revision 18578.
* Dumped revision 18579.
* Dumped revision 18580.
svn: Invalid change ordering: non-add change on deleted path

If it's really invalid I wonder how it ended up in the repo in the first 
place.  Not good.  I'll have to do some digging to find out what causes that.

Craig