Using git branches for ports (was: Re: converting rmport to git)

Wed Dec 2 16:08:12 UTC 2020

On 2020-12-01 11:36 a.m., Warner Losh wrote:
> 
> To be honest, though, I think this is an area where some experimentation to
> understand the alternatives is needed because this use case is relatively
> rare in the larger open source community.

OK, so I just have to ask (and I apologize if I'm opening a can of worms 
that has already been discussed, or that nobody wants to look at; I'll 
drop this if it's just noise):

Have you considered using a branch for each port?  Yes, I'm talking 
about 41,000+ branches.  Git should not have any trouble dealing with that.

There are a few advantages to this approach:

* Each port's change history is fully isolated and easy to track. 
(Don't worry about having lots of near-duplicate files in different 
branches or directories, as git is very efficient at dealing with this.)

* MFCs are proper git merges, which means that it is very easy to 
understand which changes have landed where.

* Cases like removing/re-adding a port would take place on that port's 
branch, making it obvious just how that work was done.

If this sounds appealing, then the real question is whether or not this 
approach trips up any important cases that arise when working on ports.

I can't answer that, but in the grand tradition of git branch ASCII-art, 
here is a pretty picture to help understand what this approach might 
look like.  In the following:
  - "a" thru "f" represent commits of some work on a port (net/gsk, for 
example).
  - "M#" represent git merge commits of some net/gsk changes.
  - "m" represents a git merge commit of some other port's changes.
My proposed branch names are on the left.  Commit history proceeds from 
left to right.

	main ....--m---m--M1--m--M2--m--M3--m
	                 /      /      /
	net/gsk ....-a--b---c--d--e---f
	                 \
	2020Q4 ....---m---M4---m---m

So the net/gsk port evolves on its own "net/gsk" branch, with commits 
a..f.  We see that the a and b changes were merged into the "main" 
branch by merge-commit M1.  Merge commit M2 brought in changes c and d, 
and then merge M3 brought changes e and f into "main".  Meanwhile, only 
changes a and b have been merged into the "2020Q4" branch (commit M4).

Both the "main" and "2020Q4" branches also contain merges from other 
ports' branches (the "m" commits).  The mainline branches ("main", 
"2020Q4", etc) would consist almost entirely of merge commits.

The net/gsk changes in the mainline branches can be easily obtained from 
simple git commands.  To see the net/gsk work that has happened in a 
mainline branch like 2020Q4, just do
	git log 2020Q4 -- net/gsk
That will list commits a, b and M4.  No need to do any patch-level analysis.

That command will also work with the existing git repo migrated form 
svn.  But the branch-based model has some additional power.  For 
example, a command like
	git log --oneline --graph 2020Q4 -- net/gsk
will output an ASCII-art picture of the 2020Q4 branch's view of the 
net/gsk port, similar to what I drew above.

More importantly, it's easy to see where any particular piece of the 
net/gsk work has landed:
	git branch -a --contains <b>
would report the "main" and "2020Q4" branches (here the <b> is the 
SHA-ID of commit b).  No need to deal with "combined" MFCs or 
did-this-change-match-that-patch problems.

What about the rmport script?  The branches I'm describing contain the 
full ports tree -- they're not "partial" or "sparse" in any way.  So to 
remove the net/gsk port, rmport would just checkout the "net/gsk" branch 
and do the removal there.  Then that can be merged (manually or 
automatically) into whatever mainline branch is desired.  There's no 
need to remove the "net/gsk" branch though, and it's better to keep it 
around in case someone wants to revive the net/gsk port in the future.

This branch-based model can be adopted atop the transitioned ports repo 
as it stands today.  There's no need (nor is it possible) to 
retroactively translate the svn history into this structure.  Sure, the 
migrated svn history isn't amenable to tricks like "git branch 
--contains", but that will become less important as time marches on. 
And the migrated history can still be teased out using patch-level 
commands like "git cherry".

Those are my main points, so you can stop reading here if you're already 
annoyed!  I'm now going to delve into some of the flexibility that this 
approach offers.

In this model the net/gsk port is free to evolve as it needs to in the 
"net/gsk" branch.  From the above we see that changes a and b were 
deemed good enough to put into 2020Q4, but changes c-f are still a bit 
experimental and they're still being validated on the "main" branch. 
(I'm making some assumptions here about how people develop the ports. 
Apologies if I got it wrong; I'm sure this model can accommodate a 
different workflow.)

In fact, that "net/gsk" branch can itself contain sub-branches for 
special circumstances.  Let's say that commit b has a bug.  We'd like to 
fix that bug in both "main" and "2020Q4", but if we just plop the fix 
onto the tip of the "net/gsk" branch (as commit g, say) that change will 
have commits c-f has part of its history:

	main ....--m---m--M1--m--M2--m--M3--m
	                 /      /      /
	net/gsk ....-a--b---c--d--e---f---g
	                 \
	2020Q4 ....---m---M4---m---m

If we just merged g into 2020Q4, we'd also bring in the c-f changes 
which we do not want to have on the 2020Q4 branch.

So instead, we can fix the bug in a mini branch based on the b commit, 
then merge that work where it's needed:

	main ....--m---m--M1--m--M2--m--M3--m--M6
	                 /      /      /      /
	net/gsk ....-a--b---c--d--e---f------g'
                         |\                  /
                         | \---------b'-----/
	                 \           \
	2020Q4 ....---m---M4---m---m--M5

Here we've fixed the bug with commit b', which is based directly on 
commit b and so we can merge b' into the "2020Q4" branch (commit M5), 
with the confidence that we're only bringing in the exact bug fix we 
need.  Meanwhile, we also merge b' onto the tip of the "net/gsk" branch 
(commit g'), fixing the bug on the port's own branch, and then merge g' 
into the "main" branch as commit M6.  It is completely clear what 
happened to the net/gsk port, and how those changes were brought into 
the mainline branches.

One last wrinkle about this picture:  Note how I did not put a name on 
the branch with the b' commit.  Git is perfectly happy to deal with this 
kind of anonymous branching, and so there's no need to pollute the 
central FreeBSD-ports repository with names for these kinds of branches. 
  But that does not prevent the net/gsk developer from having a *local* 
name for that branch in their own, local clone of the repository.  The 
developer can name their local branch whatever makes sense to them. 
When they push one of the merge commits (M5, g' or M6) to the central 
repo, the b' commit rides along but without the developer's local branch 
name.  The history recorded in the central repository is as depicted, 
with b' living on a nameless branch.

I can't believe you've read all of this!  Thanks!

		M.