FYI: SVN to GIT converter currently broken, github is falling behind

Ulrich Spörlein uqs at
Sun Nov 8 10:32:41 UTC 2015

2015-11-08 2:51 GMT+01:00 Alfred Perlstein <alfred at>:
> Uli,
> One of the biggest concerns I've heard from folks using FreeBSD's git mirror
> is that the hashes can change.
> I have a question about this.   Is it possible to keep track of what the
> "official" git mirror (on github) is doing and keep that as a log.  Then
> that log can be used to replay commits when there is a divergence problem.
> What I'm basically saying is that let's take this small example:
> importer is working fine @rev 10000
> imports 10000
> imports 10001
> imports 10002
> something happens to importer to give indeterminate shas.
> imports 10003 - sha is "unstable" sha3
> imports 10004 - sha is "unstable" sha4
> imports 10005 - sha is "unstable" sha5
> imports 10006 - sha is "unstable" sha6
> importer is fixed
> At this point normally we'd rewind the importer to 10002 and then force
> update the affected branches.
> My question is... can the imports of 10003, 10004, 10005 and 10006 be put
> into the importer such that any "mirror site" that re-does the import using
> the most up to date importer will get the same shas.
> That would allow to proceed with 10007, etc without force pushing.
> This should be possible based on querying "git" for the meta data associated
> with sha3..sha6 and then forcing those commits to have the same meta data.
> This would eliminate the concern about shas in the mirror changing that I've
> heard.

The goal of the conversion is that everyone can re-do the conversion
in their basement and come up with the same history and checksums.
This was not the case when I first started, as there was some
non-deterministic hash structure being used in svn2git. This was fixed
in the code and then all converter runs produced the very same

The scenario that we have right now, is that one of the merge commits
done about two weeks ago is being handled different by svn2git w/ svn
v1.8 vs. svn v1.9 and I haven't investigated yet how the API's
behavior changed to cause this. I'm afraid I also swapped out all my
knowledge about svn2git internals and will have to redo this all from
scratch :/

Your suggestion could only work, if we hard-code this svn revision
special handling into svn2git, either in the code or by providing more
mappings and rules to the process. svn2git should run hermetic and not
poke at github's commits to see how things were handled in the past.
It has to be self-sufficient and must not depend on github.

This would also only work, if the "breakage" window was very small,
but it is already about two weeks long and will surely increase till I
find the proper fix.

So, to take a stand here: this sort of kludge is unlikely to ever
happen. Git commit hashes *might* change in the future. I really don't
see how this is a big deal anyway.  It happened once and I'm trying to
have it never happen again. But why are people afraid of this
happening? Every "official" git commit is tagged with a SVN revision
and the contents of those revisions are obviously correct (just not
the ancestry and the commit objects, possibly). So it would be easy to
write a script that replays VendorA's git history and swaps out the
new official commits for the old official commits. There would be no
merge conflicts.

I can see how this would be annoying if you have 100 developers and
dozens of branches that are far from mainline FreeBSD. But I'm sure
these companies that depend on git will come forward and donate some
of their developer manpower to help me with keeping the converter
stable/deterministic. Right? Right? :) :)


More information about the freebsd-git mailing list