Using git branches for ports (was: Re: converting rmport to git)

Wed Dec 2 16:32:35 UTC 2020

Top posting... I'm not sure how productive it would be to do the blow by
blow since it seems  to be fraught

With respect, I think this is too complicated to be workable. git merges
don't work quite the way you think they do (merging an individual commit
isn't a thing in git, it's specifically called out as a cherry-pick for a
reason). Such a complex merge history will create way more issues than it
solves, imho. "proper git merges" is not something that we should strive
for, since merges in git are very very different than they are in
subversion (where proper merges are preferable). Many of the things that
you call 'merges' would more properly be done as cherry picks. There's no
benefit from forcing them to be merges, and IMHO, and it creates nothing
but trouble.

There's also issues with complex ports that span dozens of different
directories that would be problematic.

While there are ways that we may be able to use branches in a productive
way, but I'm having trouble seeing how this proposal accomplishes that.

Warner

On Wed, Dec 2, 2020 at 9:08 AM Marc Branchaud <marcnarc at gmail.com> wrote:

> On 2020-12-01 11:36 a.m., Warner Losh wrote:
> >
> > To be honest, though, I think this is an area where some experimentation
> to
> > understand the alternatives is needed because this use case is relatively
> > rare in the larger open source community.
>
> OK, so I just have to ask (and I apologize if I'm opening a can of worms
> that has already been discussed, or that nobody wants to look at; I'll
> drop this if it's just noise):
>
> Have you considered using a branch for each port?  Yes, I'm talking
> about 41,000+ branches.  Git should not have any trouble dealing with that.
>
> There are a few advantages to this approach:
>
> * Each port's change history is fully isolated and easy to track.
> (Don't worry about having lots of near-duplicate files in different
> branches or directories, as git is very efficient at dealing with this.)
>
> * MFCs are proper git merges, which means that it is very easy to
> understand which changes have landed where.
>
> * Cases like removing/re-adding a port would take place on that port's
> branch, making it obvious just how that work was done.
>
> If this sounds appealing, then the real question is whether or not this
> approach trips up any important cases that arise when working on ports.
>
> I can't answer that, but in the grand tradition of git branch ASCII-art,
> here is a pretty picture to help understand what this approach might
> look like.  In the following:
>   - "a" thru "f" represent commits of some work on a port (net/gsk, for
> example).
>   - "M#" represent git merge commits of some net/gsk changes.
>   - "m" represents a git merge commit of some other port's changes.
> My proposed branch names are on the left.  Commit history proceeds from
> left to right.
>
>         main ....--m---m--M1--m--M2--m--M3--m
>                          /      /      /
>         net/gsk ....-a--b---c--d--e---f
>                          \
>         2020Q4 ....---m---M4---m---m
>
> So the net/gsk port evolves on its own "net/gsk" branch, with commits
> a..f.  We see that the a and b changes were merged into the "main"
> branch by merge-commit M1.  Merge commit M2 brought in changes c and d,
> and then merge M3 brought changes e and f into "main".  Meanwhile, only
> changes a and b have been merged into the "2020Q4" branch (commit M4).
>
> Both the "main" and "2020Q4" branches also contain merges from other
> ports' branches (the "m" commits).  The mainline branches ("main",
> "2020Q4", etc) would consist almost entirely of merge commits.
>
> The net/gsk changes in the mainline branches can be easily obtained from
> simple git commands.  To see the net/gsk work that has happened in a
> mainline branch like 2020Q4, just do
>         git log 2020Q4 -- net/gsk
> That will list commits a, b and M4.  No need to do any patch-level
> analysis.
>
> That command will also work with the existing git repo migrated form
> svn.  But the branch-based model has some additional power.  For
> example, a command like
>         git log --oneline --graph 2020Q4 -- net/gsk
> will output an ASCII-art picture of the 2020Q4 branch's view of the
> net/gsk port, similar to what I drew above.
>
> More importantly, it's easy to see where any particular piece of the
> net/gsk work has landed:
>         git branch -a --contains <b>
> would report the "main" and "2020Q4" branches (here the <b> is the
> SHA-ID of commit b).  No need to deal with "combined" MFCs or
> did-this-change-match-that-patch problems.
>
> What about the rmport script?  The branches I'm describing contain the
> full ports tree -- they're not "partial" or "sparse" in any way.  So to
> remove the net/gsk port, rmport would just checkout the "net/gsk" branch
> and do the removal there.  Then that can be merged (manually or
> automatically) into whatever mainline branch is desired.  There's no
> need to remove the "net/gsk" branch though, and it's better to keep it
> around in case someone wants to revive the net/gsk port in the future.
>
> This branch-based model can be adopted atop the transitioned ports repo
> as it stands today.  There's no need (nor is it possible) to
> retroactively translate the svn history into this structure.  Sure, the
> migrated svn history isn't amenable to tricks like "git branch
> --contains", but that will become less important as time marches on.
> And the migrated history can still be teased out using patch-level
> commands like "git cherry".
>
> Those are my main points, so you can stop reading here if you're already
> annoyed!  I'm now going to delve into some of the flexibility that this
> approach offers.
>
>
> In this model the net/gsk port is free to evolve as it needs to in the
> "net/gsk" branch.  From the above we see that changes a and b were
> deemed good enough to put into 2020Q4, but changes c-f are still a bit
> experimental and they're still being validated on the "main" branch.
> (I'm making some assumptions here about how people develop the ports.
> Apologies if I got it wrong; I'm sure this model can accommodate a
> different workflow.)
>
> In fact, that "net/gsk" branch can itself contain sub-branches for
> special circumstances.  Let's say that commit b has a bug.  We'd like to
> fix that bug in both "main" and "2020Q4", but if we just plop the fix
> onto the tip of the "net/gsk" branch (as commit g, say) that change will
> have commits c-f has part of its history:
>
>         main ....--m---m--M1--m--M2--m--M3--m
>                          /      /      /
>         net/gsk ....-a--b---c--d--e---f---g
>                          \
>         2020Q4 ....---m---M4---m---m
>
> If we just merged g into 2020Q4, we'd also bring in the c-f changes
> which we do not want to have on the 2020Q4 branch.
>
> So instead, we can fix the bug in a mini branch based on the b commit,
> then merge that work where it's needed:
>
>         main ....--m---m--M1--m--M2--m--M3--m--M6
>                          /      /      /      /
>         net/gsk ....-a--b---c--d--e---f------g'
>                          |\                  /
>                          | \---------b'-----/
>                          \           \
>         2020Q4 ....---m---M4---m---m--M5
>
> Here we've fixed the bug with commit b', which is based directly on
> commit b and so we can merge b' into the "2020Q4" branch (commit M5),
> with the confidence that we're only bringing in the exact bug fix we
> need.  Meanwhile, we also merge b' onto the tip of the "net/gsk" branch
> (commit g'), fixing the bug on the port's own branch, and then merge g'
> into the "main" branch as commit M6.  It is completely clear what
> happened to the net/gsk port, and how those changes were brought into
> the mainline branches.
>
> One last wrinkle about this picture:  Note how I did not put a name on
> the branch with the b' commit.  Git is perfectly happy to deal with this
> kind of anonymous branching, and so there's no need to pollute the
> central FreeBSD-ports repository with names for these kinds of branches.
>   But that does not prevent the net/gsk developer from having a *local*
> name for that branch in their own, local clone of the repository.  The
> developer can name their local branch whatever makes sense to them.
> When they push one of the merge commits (M5, g' or M6) to the central
> repo, the b' commit rides along but without the developer's local branch
> name.  The history recorded in the central repository is as depicted,
> with b' living on a nameless branch.
>
> I can't believe you've read all of this!  Thanks!
>
>                 M.
>
>