Monitoring commits on all branches

Thu Nov 19 22:30:23 UTC 2020

On 2020-11-19 2:00 p.m., Dan Langille wrote:
>> On Nov 19, 2020, at 11:16 AM, Marc Branchaud <marcnarc at gmail.com>
>> wrote:
>> 
>> On 2020-11-18 8:49 p.m., Dan Langille wrote:
>>> How can a repo be monitored for commits on all branches? I know
>>> how to ask a given branch: do you have any commits after
>>> foo_hash? How do I: * get a list of all commits since foo_hash
>> 
>> A quick a note about Warner's reply:
>> 
>>> git log $hash..HEAD
>> 
>> "HEAD" is just a git nickname for "whatever you have currently
>> checked-out" (which can be a branch, a tag, or "detached" commit
>> SHA ID).
> 
> Mathieu mentioned: git log $foo_hash...branch_name
> 
> That was the first time I've seen that used. All previous suggestions
> were HEAD, not branch_name. But they are all the same in this
> context?

I hesitate to say yes, if only because I don't know precisely what
context you're in.

I think the previous suggestions assumed you have checked-out the branch
you're interested in.  Your email wasn't clear about that, and
especially if you're doing "detached" checkouts using commit SHA IDs
using "HEAD" might not work as those suggestions suggest.

Just remember that the SHA IDs are the fundamental, immutable
identifiers for git commits.  All other names are just
labels/aliases/symbols for a SHA ID, and some of those names will change
which SHA ID they represent, depending on the name's type and what
operations you do in your repository.  "HEAD" is especially mercurial.

>>> * know which branch each of those commits was on (e.g. master,
>>> branches/2020Q4)
>> 
>> Unfortunately you'll find most normal git advice to be a bit
>> frustrating with the FreeBSD repos, because FreeBSD doesn't work
>> the way most people use git.  Specifically, the FreeBSD project
>> does not ever merge branches (in the git sense of the word
>> "merge").  Things would be very, very much easier if the FreeBSD
>> project were to use git-style merging.  I believe there are
>> discussions underway about adjusting the whole MFC process for the
>> git world.  I admit that part of my motivation in writing this
>> message is to provide grist for that mill.
>> 
>> Fortunately even without git-merged branches, there are still git
>> tools that help, though they're not as precise as one would like.
>> 
>> Let's look at a concrete example with the beta ports git repo
>> (which I just cloned), and compare the 2020Q4 and main branches.
>> I'll start with some overall exploration, then address your
>> specific question.
>> 
>> There are 298 commits in the 2020Q4 branch.  I know this because 
>> git merge-base origin/main origin/branches/2020Q4 tells me where
>> 2020Q4 branched off of main: commit 5dbe4e5f775ea2.  And git
>> rev-list 5dbe4e5f775ea2..origin/branches/2020Q4 | wc -l says "299".
>> (The "rev-list" command is a bare-bones version of "log" that only
>> lists commit SHA IDs.)
> 
> [examples snipped]
> 
> I followed that.
> 
> I took the merge information as background and good-to-know, because
> FreshPorts won't be doing any merging. It just needs a good "git
> checkout" working copy.
> 
> Sorry for such a long reply.

You're apologizing, after my verbosity? :)

> * How FreshPorts extracts data
> 
> FreshPorts is only interested in a snapshot of the repo with respect
> to a given commit.  It works on the 'repo as a whole' to extract
> values from the ports which were affected by that commit. Case in
> point: a commit to a parent port might affect any or all of the child
> ports.  All the child ports need to be refreshed.

I'm guessing that typically the child ports' stuff (in the ports tree) 
does not change when the parent port changes.  So you're talking about a 
build-time or run-time dependency between ports?  I imagine you have 
some kind of ports-dependency database, which you can consult when a 
particular port is changed.

Or are you thinking git might be able to help you here?

> I am quickly concluding that FreshPorts must decide in advance what
> git branches it will pay attention to. At present, it follows all
> branches.
> 
> * FreshPorts (without git) uses email to create XML
> 
> When moving FreshPorts from subversion to git, one of the goals was
> to avoid relying on email to know that a commit has occurred. That is
> how FreshPorts has always worked. The email (from the CVS commit) was
> parsed and XML created. This code was updated for SVN. The XML is
> then used to load the commit into the FreshPorts database which then
> drives the website contents.
> 
> When I started the GIT conversion, there was no commit email. "git
> log $foo_hash...HEAD" is how FreshPorts knows what commits to
> process.
> 
> One positive aspect of email approach: it identified the branch. So
> far, I can't see how I can process the repo as a whole and see every
> commit and know what branch it was on.

You're right about that, but you *can* discover all the branches in the 
repo (see below).

> * Polling git
> 
> It is beginning to sound like the FreshPorts git code for detecting
> incoming commits will be:
> 
> Every N minutes, do this:
> 
> for each repo in REPOS
>   for branch in BRANCHES
>     cd to the directory for that repo >     git checkout branch
>     git log $branch_last_hash...HEAD
>     for each of those commits
>        process the commit
>     end for >   end for
> end for

That loops looks basically right. You'll need to "git fetch" in each 
repo before processing branches.

Watch out with that "git checkout branch" part: This is where a lot of 
people get tripped up when they move from svn to git.  It doesn't help 
that git tries to do some hand-holding here, but really git users would 
benefit from simply understanding how branch names are just labels for 
commit SHA IDs.  Without that understanding, people end up going down 
the rabbit-hole that is the "git pull" command, a wretched hive of scum 
and villainy if there ever was one.

I'll just gloss over a lot of detail (but feel free to ask!) and 
recommend that you work with the "origin/"-prefixed branch names.  These 
get updated every time you "git fetch" from the remote repo, and 
basically behave as you'd expect.  It's appropriate in your case, 
because you're not creating any new commits.

> At present, the REPOS and BRANCHES are:
> 
> * freebsd BRANCHES="master branches/2020Q4 branches/2020Q3 branches/2020Q2 ...etc"
> * freebsd-ports BRANCHES="master stable/12 stable/11"
> * freebsd-doc BRANCHES="master"
> 
> Some might ask:
> 
> * Why not just master and latest-quarterly for freebsd-ports?
> * Because commits to older branches sometimes occur (or at least I
> thought I saw one once)
> 
> Commit hooks might also help, but I'm not sure if that will make
> things easier or complicate everything

Looking through the set of hooks ("git help hooks"), I don't see 
anything that would help you.

> * When new branches arrive
> 
> It is vital that FreshPorts remain automated as much as it can be. At
> present, under SVN, I might have to fix things perhaps 5 or 6 times a
> year, usually because a commit did not get processed.
> 
> Keeping that in mind, I do not yet know how to handle the following
> situations:
> 
> A new branch comes out.
>
> * Automation might be possible for ports quarterly branches
> * FreshPorts has to know there is a new branch

Fundamentally you start by doing a "git fetch" to retrieve updates from 
the official repository (you probably know that already, but I just want 
to be pedantic).

Then "git branch -r" will list all the remote ("origin/"-prefixed) 
branches, including whatever new ones were just fetched.

You can also do "git for-each-ref refs/remotes/origin" to see which 
commit SHA ID corresponds to each remote branch name (though the branch 
names are prefixed with "refs/remotes/origin/" not just "origin/").  You 
might find its output easier to parse than the equivalent "git branch 
-rv" suggested by Oliver.

> * BRANCHES needs to be updated
> * I don't see that it can be automated for stable/*

Instead of saving BRANCHES locally, just use the output of one of the 
above commands every time you run the script.  That way the list will 
always be up to date.

> * need to handle 'git checkout branch' when branch does not exist?

If you end up doing that, I'd say you're doing something wrong.

> * Once branch exists, how do you find out about the commits when you have no
> starting point for 'git log'?

Use "git merge-base" to find where the new branch split off from the 
"master" or "main" branch.

> Right now, a new quarterly branch is noticed when the first commit
> email comes through.  FreshPorts then does an 'svn co' for that
> branch.
> 
> I'm hoping someone has good ideas for my edge cases.

I hope I've been a bit helpful!

One last, small idea:  Consider using tags in your local repos to track 
your $branch_last_hash.  For example, after you finish processing the 
2020Q4 branch, do
   git tag -f last-2020Q4 origin/branches/2020Q4
Then the next time through the 2020Q4 branch you can start at the 
"last-2020Q4" tag.  For a new branch that doesn't have a tag yet, use 
"git merge-base" to find the starting point.

		M.