Monitoring commits on all branches

Marc Branchaud marcnarc at gmail.com
Thu Nov 19 22:50:17 UTC 2020


On 2020-11-19 12:16 p.m., Warner Losh wrote:
> 
> Thanks Marc! This is great advice... more comments below...
> 
> On Thu, Nov 19, 2020 at 9:16 AM Marc Branchaud <marcnarc at gmail.com 
> <mailto:marcnarc at gmail.com>> wrote:
> 
>     On 2020-11-18 8:49 p.m., Dan Langille wrote:
>      > How can a repo be monitored for commits on all branches?
>      >
>      > I know how to ask a given branch: do you have any commits after
>     foo_hash?
>      >
>      > How do I:
>      >
>      > * get a list of all commits since foo_hash
> 
>     A quick a note about Warner's reply:
> 
>      > git log $hash..HEAD
> 
>     "HEAD" is just a git nickname for "whatever you have currently
>     checked-out" (which can be a branch, a tag, or "detached" commit SHA
>     ID).
> 
>      > * know which branch each of those commits was on (e.g. master,
>     branches/2020Q4)
> 
>     Unfortunately you'll find most normal git advice to be a bit
>     frustrating
>     with the FreeBSD repos, because FreeBSD doesn't work the way most
>     people
>     use git.  Specifically, the FreeBSD project does not ever merge
>     branches
>     (in the git sense of the word "merge").  Things would be very, very
>     much
>     easier if the FreeBSD project were to use git-style merging.  I believe
>     there are discussions underway about adjusting the whole MFC process
>     for
>     the git world.  I admit that part of my motivation in writing this
>     message is to provide grist for that mill.
> 
> 
> FreeBSD src will be doing cherry-picks. There's only pain and suffering 
> from merge commits in this environment. Git's tools are adequate to cope 
> with individual and squashed cherry picks.

Fair enough.  I'm also sure that the git community would welcome patches 
that help make FreeBSD's workflow a bit smoother.

>     Fortunately even without git-merged branches, there are still git tools
>     that help, though they're not as precise as one would like.
> 
> 
> They are for src. I suspect for ports they might not be.
> 
>     Let's look at a concrete example with the beta ports git repo (which I
>     just cloned), and compare the 2020Q4 and main branches.  I'll start
>     with
>     some overall exploration, then address your specific question.
> 
>     There are 298 commits in the 2020Q4 branch.  I know this because
>           git merge-base origin/main origin/branches/2020Q4
>     tells me where 2020Q4 branched off of main: commit 5dbe4e5f775ea2.  And
>           git rev-list 5dbe4e5f775ea2..origin/branches/2020Q4 | wc -l
>     says "299".  (The "rev-list" command is a bare-bones version of "log"
>     that only lists commit SHA IDs.)
> 
>     Meanwhile there have been 4538 commits to the main branch since commit
>     5dbe4e5f775ea2.
> 
>     As far as git is concerned, those 299 commits in 2020Q4 are *different*
>     from anything in main.  Even though most of them made the exact same
>     code changes, they were created at different times, often by different
>     authors, and they have different commit messages.
> 
> 
> True.
> 
>     But you can still ask git to look at the code-change level to see which
>     2020Q4 commits exactly replicated the code change from main:
> 
>           git cherry -v origin/main origin/branches/2020Q4
> 
>     This little piece of magic looks at the 299 commits in 2020Q4 that are
>     not in main and compares their code changes to the 4538 commits in main
>     that are not in 2020Q4.  It prints out the 299 2020Q4 commit SHA IDs,
>     prefixed with either a "- " or a "+ ".  The -v appends the commit
>     message's first line:
> 
>           - 394d9746e5eea73f56334b2e7ddbdc8f686d6541 MFH: r550869
>           + 1ac9571956759c91d852ee92859a12e52dcbde48 MFH: r550885 r550886
>           - fd411bdfda55488b84de75e6b043c513a281abf0 MFH: r551209
>           - 533cdaa97457b3318aebcc53f7a1a46ea66721da MFH: r551236
>           ......
> 
>     A "-" means that the commit matches the code change made by a commit in
>     main, while a "+" means that the commit's code change does not
>     *exactly*
>     match any main commit since commit 5dbe4e5f775ea2.
> 
>     So
>           git cherry -v origin/main origin/branches/2020Q4 | grep ^-
>     shows us the 234 2020Q4 commits that made the exact same change as a
>     commit in main.
> 
>     And
>           git cherry -v origin/main origin/branches/2020Q4 | grep ^+
>     shows us that there are 41 not-exactly-the-same-change commits in
>     2020Q4.  Mostly these are ones that combined two or more MFH's into one
>     commit (e.g. 2020Q4 commit 1ac95719567), or that changed a file in a
>     slightly different way (see the first patch hunk of 2020Q4 commit
>     cbd002878f2, compared to its counterpart in main: commit a5d21ea16b6).
> 
> 
> Yes. These sorts of issues are why doing merge commits aren't always the 
> right way to go because we're not merging the entire history together 
> (doing a join), but rather just small subsets of it. How to cope with 
> the mostly the same small files tree that is our ports tree in the face 
> of git's guessing which does a poor job on such a tree is an interesting 
> problem to solve. merge commits can help some of the issue, but they can 
> create other issues as well when done incorrectly....

I admit I don't quite follow you there, but I'm particularly ignorant of 
the ports tree.  I have some quite-likely-stupid ideas after having 
played with it for 10 minutes while composing my earlier message, but 
even if the ideas are somehow clever I suspect they'd entail too much 
workflow change to be palatable.

> Even so, great hints for how to find cherry picked items. I suspect 
> we'll need to have some tooling that embeds hash(es) into the commit 
> message in some stylized way to allow tracking the non--trivial patch 
> changes that sometimes happen: squashing several cherry picks, necessary 
> differences due to branch drift, etc. It's unclear how we should do 
> this, though, in a way that works well, is reliable and doesn't add 
> undue friction to the process...

It's traditional when doing a cherry-pick to add a
	Cherry-picked-from: <SHA ID>
line to the commit message.  The "cherry-pick" command even has a -x 
option to automatically add such a line to the new commit's message. 
(There's also a "git interpret-trailers" command that is a 
general-purpose tool for manipulating "Foo: blah blah" lines in commit 
messages.)

"git cherry-pick" might actually lead people away from squashing 
together multiple changes into one commit, because you have to make a 
bit of an effort to get cherry-pick to squash things up.  I personally 
think the project would benefit from discouraging squashed-together MFC's.

>     Now to your specific question: Given a commit, how can we tell which
>     branches contain that code change?  Let's look at main commit
>     6a9a8389d609 which I've determined, through manual spelunking, matches
>     2020Q4's commit 02eba4048564.
> 
>     At a basic level, "git cherry" can tell us that *something* in 2020Q4
>     made the same change as commit 6a9a8389d609.  Here I reversed the order
>     of the branch names in the command:
>           git cherry origin/branches/2020Q4 origin/main | grep 6a9a8389d609
>     This outputs:
>           - 6a9a8389d609ca0370c8c6eb8f993c1aa4071681
>     and the "-" tells me that 6a9a8389d609's code change is *somewhere* in
>     2020Q4 unique 299 commits.
> 
>     Unfortunately there's no convenient git command that'll tell you
>     *which*
>     2020Q4 commit replicated commit 6a9a8389d609.  For that, we need to
>     do a
>     bit of scripting:
> 
>     -----8<-----8<-----8<-----8<-----
> 
>     #!/bin/sh
> 
>     TARGET="6a9a8389d609"
> 
>     BASE=`git merge-base origin/branches/2020Q4 origin/main`
> 
>     TARGET_PATCH_ID=`git show -p $TARGET | git patch-id --stable | cut -f 1
>     -d ' '`
> 
>     for REV in `git rev-list $BASE..origin/branches/2020Q4`; do
>          PATCH_ID=`git show -p $REV | git patch-id --stable | cut -f 1
>     -d ' '`
>          if [ "$PATCH_ID" = "$TARGET_PATCH_ID" ]; then
>             echo "Found a commit that replicated target commit $TARGET:"
>             echo
>             git show -s $REV
>             exit 0
>          fi
>     done
> 
>     echo "Did not find any commit that exactly replicated $TARGET."
>     exit 1
> 
>     ----->8----->8----->8----->8-----
> 
>     This only looks at the 2020Q4 branch, but it's easily adapted to
>     look at
>     a user-specified branch, or multiple branches.  (In the above I used
>     "git patch-id", which is what "git cherry" uses internally to
>     identify a
>     commit's code changes.)
> 
>     I hope all this helps a bit!
> 
> 
> It does. I thought I'd had my head deep into git, but hadn't stumbled 
> upon this.

I've been using git for over 10 years, and I still discover new things. 
  This "git cherry" stuff, for example, I've only started using a little 
bit in the last few months.

> It looks useful enough I'll try to add a section to my FAQ.

I'm honoured!

		M.



More information about the freebsd-git mailing list