Next odd commit affecting `git subtree split` experiments with contrib/elftoolchain

Ed Maste emaste at freebsd.org
Wed Jun 17 16:47:12 UTC 2020


On Wed, 17 Jun 2020 at 12:00, Ulrich Spörlein <uqs at freebsd.org> wrote:
>
> Running the subtree split, I get a history with about 437 commits. I see in your https://github.com/emaste/elftoolchain/tree/split-from-cgit-beta that you only end up with 277 commits (if that display is to be trusted).

Are you using unmodified subtree split, from git port/pkg? The patch
set from Tom Clarkson improves the detection of mainline vs subtree
significantly. In the existing cgit-beta (without the MFH changes you
discussed here) it produces a subtree with tens/hundreds of thousands
of commits, because a mainline commit "leaks" into the subtree via a
merge. The patched git subtree is what I used for the split
elftoolchain that I shared.

> I'm not sure whether it would be straightforward to squash the right commits and keep
> the ones with the proper commit message. Your repo still has a view MFH commits that
> one might want to remove. Using git `filter-repo` might do the trick ...

Indeed, although I'm not particularly concerned if there are a few
stray MFH commits - it's a little bit of clutter but accurately
represents what happened in that subtree in the svn world.

> Just to make sure, you know that you can get this like so:
> % git log --reverse --format=%h master -- contrib/elftoolchain/ | head -1
> 37429c2aa7e7

For the email I sent I just reviewed all of the contrib/elftoolchain
history anyway, and looked at the last commit. Thanks for this though;
I suspect that if we try automating this we could add --merges.

> (note sure why using -n1 instead of head(1) will result in the latest, not the oldest. Seems that it ignores --reverse)

Indeed, this looks like a git bug.

> Would be good if you could run a script against all contrib prefixes and later
> count the number of commits that a contrib-tree produces to see if something
> weird happens.

You mean try running `git subtree split` on each contrib prefix, and
checking that the number of commits in each generated tree is
sensible? For example, inspect any subtree with over say 500 commits?

As a first pass for identifying contrib prefixes I tried:

ls -1d contrib/* sys/contrib/* crypto/* sys/crypto/* cddl/* sys/cddl/* sys/gnu/*

sys/crypto/ and the cddl ones aren't quite right, and I still need to
check for additional hierarchy (e.g., if we have cases like
contrib/netbsd/blocklist instead of contrib/blocklist)

> You can test both parents whether they are reachable from vendor/elftoolchain/dist,

I'm hoping to find an algorithm that could be made general and
submitted upstream, so that we could have something like git subtree
split --initial --prefix=contrib/elftoolchain, and have the --initial
calculate the --onto revision automatically. If we produce some
bespoke tooling for FreeBSD though this branch name approach should
work, but I think we'd have to have a map of contrib directory to
vendor branch. I believe that some are not the same in contrib and
vendor.

> or look at their notes:
>
> % git log -n1 --format=%P 37429c2aa7e7 | xargs -n1 -I@ git log -n1 --format="%h %N" @
> 8a7f75c8fcc5 svn path=/head/; revision=260666
>
> 5265ace0e440 svn path=/vendor/elftoolchain/dist/; revision=260684
> svn path=/vendor/elftoolchain/elftoolchain-r2974/; revision=260685; tag=vendor/elftoolchain/elftoolchain-r2974

This seems like a simpler, workable approach for our tree - anything
with a note containing "svn path=/vendor" is a subtree commit.

> For my own understanding, all the issues around subtree splitting are actually
> not blocking the conversion in any way, right? All they do is make the lives miserable
> for contrib-software maintainers and they might delay new code drops under
> contrib/ yes?

It depends on your definition of "blocking" I think, but your
statement is generally true - we could use the existing cgit-beta
conversion, build releases from it, etc. In the current form, with
unpatched git-subtree, the bootstrap process will be quite awkward for
contrib software maintainers though.

I think we have three ways we can address this:

1. Change the svn2git process so that we don't trip over unpatched
git-subtree's issue with mainline history leaking into the subtree.
2. Get Tom Clarkson's git-subtree patches into upstream git, or
require that contrib maintainers use our own patched git until that
happens.
3. Develop and use an alternate subtree splitter.

I suppose there is also
4. Reconsider git subtree altogether (e.g. submodules).
but I think there's little appetite for this.

At this point I think that option 2 is the most straightforward, and
I'm now reasonably confident that it will work as we want. With this
being the case I'd say we should focus on tuning svn2git to produce
"sensible" output without regard to how unpatched git-subtree handles
the output. That is, I'd say I'm broadly happy with the state of
conversion in cgit-beta today.


More information about the freebsd-git mailing list