Monitoring commits on all branches

Sat Nov 21 14:12:27 UTC 2020

On Thu, Nov 19, 2020, at 5:30 PM, Marc Branchaud wrote:
> On 2020-11-19 2:00 p.m., Dan Langille wrote:
> > * How FreshPorts extracts data
> > 
> > FreshPorts is only interested in a snapshot of the repo with respect
> > to a given commit.  It works on the 'repo as a whole' to extract
> > values from the ports which were affected by that commit. Case in
> > point: a commit to a parent port might affect any or all of the child
> > ports.  All the child ports need to be refreshed.
> 
> I'm guessing that typically the child ports' stuff (in the ports tree) 
> does not change when the parent port changes.  So you're talking about a 
> build-time or run-time dependency between ports?  I imagine you have 
> some kind of ports-dependency database, which you can consult when a 
> particular port is changed.

If the parent port change PORTVERSION, then make -V PORTVERSION on the child
port probably changes value.  The database entry for the child is updated
to make sure it reflects what is now declared in the parent.

All child ports are updated when a commit affects a parent port.

> Or are you thinking git might be able to help you here?

This is just background information as to why I need a snapshot (i.e. checkout a commit).

> > I am quickly concluding that FreshPorts must decide in advance what
> > git branches it will pay attention to. At present, it follows all
> > branches.
> > 
> > * FreshPorts (without git) uses email to create XML
> > 
> > When moving FreshPorts from subversion to git, one of the goals was
> > to avoid relying on email to know that a commit has occurred. That is
> > how FreshPorts has always worked. The email (from the CVS commit) was
> > parsed and XML created. This code was updated for SVN. The XML is
> > then used to load the commit into the FreshPorts database which then
> > drives the website contents.
> > 
> > When I started the GIT conversion, there was no commit email. "git
> > log $foo_hash...HEAD" is how FreshPorts knows what commits to
> > process.
> > 
> > One positive aspect of email approach: it identified the branch. So
> > far, I can't see how I can process the repo as a whole and see every
> > commit and know what branch it was on.
> 
> You're right about that, but you *can* discover all the branches in the 
> repo (see below).
> 
> > * Polling git
> > 
> > It is beginning to sound like the FreshPorts git code for detecting
> > incoming commits will be:
> > 
> > Every N minutes, do this:
> > 
> > for each repo in REPOS
> >   for branch in BRANCHES
> >     cd to the directory for that repo >     git checkout branch
> >     git log $branch_last_hash...HEAD
> >     for each of those commits
> >        process the commit
> >     end for >   end for
> > end for
> 
> That loops looks basically right. You'll need to "git fetch" in each 
> repo before processing branches.

I was usually doing a git pull.  I've been reading up on 'git pull' vs 'git fetch'.
I think I will move to 'git fetch'.

To reiterate: FreshPorts basically gets a read-only copy of the repo. It never
does local mods. It just needs the files. It is similar to 'svn up' with never
changing the files.  An 'svn export' would be workable, but in practice, 'svn up'
is faster.

This is what I was reading:

* https://stackoverflow.com/questions/292357/what-is-the-difference-between-git-pull-and-git-fetch
* https://longair.net/blog/2009/04/16/git-fetch-and-merge/

> Watch out with that "git checkout branch" part: This is where a lot of 
> people get tripped up when they move from svn to git.  It doesn't help 
> that git tries to do some hand-holding here, but really git users would 
> benefit from simply understanding how branch names are just labels for 
> commit SHA IDs.  Without that understanding, people end up going down 
> the rabbit-hole that is the "git pull" command, a wretched hive of scum 
> and villainy if there ever was one.

I understand the concern and I think I follow.  Within the confines of
'only wanting the files for reference, never local modification', pull and 
fetch might be the same for me. I'll look at fetch more.

> I'll just gloss over a lot of detail (but feel free to ask!) and 
> recommend that you work with the "origin/"-prefixed branch names.  These 
> get updated every time you "git fetch" from the remote repo, and 
> basically behave as you'd expect.  It's appropriate in your case, 
> because you're not creating any new commits.
> 
> > At present, the REPOS and BRANCHES are:
> > 
> > * freebsd BRANCHES="master branches/2020Q4 branches/2020Q3 branches/2020Q2 ...etc"
> > * freebsd-ports BRANCHES="master stable/12 stable/11"
> > * freebsd-doc BRANCHES="master"
> > 
> > Some might ask:
> > 
> > * Why not just master and latest-quarterly for freebsd-ports?
> > * Because commits to older branches sometimes occur (or at least I
> > thought I saw one once)
> > 
> > Commit hooks might also help, but I'm not sure if that will make
> > things easier or complicate everything
> 
> Looking through the set of hooks ("git help hooks"), I don't see 
> anything that would help you.

My idea for a hook: FreeBSD tells FreshPorts when a new commit arrives.
FreshPorts wakes up and processes it.

> > * When new branches arrive
> > 
> > It is vital that FreshPorts remain automated as much as it can be. At
> > present, under SVN, I might have to fix things perhaps 5 or 6 times a
> > year, usually because a commit did not get processed.
> > 
> > Keeping that in mind, I do not yet know how to handle the following
> > situations:
> > 
> > A new branch comes out.
> >
> > * Automation might be possible for ports quarterly branches
> > * FreshPorts has to know there is a new branch
> 
> Fundamentally you start by doing a "git fetch" to retrieve updates from 
> the official repository (you probably know that already, but I just want 
> to be pedantic).
> 
> Then "git branch -r" will list all the remote ("origin/"-prefixed) 
> branches, including whatever new ones were just fetched.
> 
> You can also do "git for-each-ref refs/remotes/origin" to see which 
> commit SHA ID corresponds to each remote branch name (though the branch 
> names are prefixed with "refs/remotes/origin/" not just "origin/").  You 
> might find its output easier to parse than the equivalent "git branch 
> -rv" suggested by Oliver.

Thanks.

> > * BRANCHES needs to be updated
> > * I don't see that it can be automated for stable/*
> 
> Instead of saving BRANCHES locally, just use the output of one of the 
> above commands every time you run the script.  That way the list will 
> always be up to date.

Yes, I think so too.

> 
> > * need to handle 'git checkout branch' when branch does not exist?
> 
> If you end up doing that, I'd say you're doing something wrong.

Agreed.

> > * Once branch exists, how do you find out about the commits when you have no
> > starting point for 'git log'?
> 
> Use "git merge-base" to find where the new branch split off from the 
> "master" or "main" branch.

That is my starting commit for that branch?

> 
> > Right now, a new quarterly branch is noticed when the first commit
> > email comes through.  FreshPorts then does an 'svn co' for that
> > branch.
> > 
> > I'm hoping someone has good ideas for my edge cases.
> 
> I hope I've been a bit helpful!

More than a bit.

> One last, small idea:  Consider using tags in your local repos to track 
> your $branch_last_hash.  For example, after you finish processing the 
> 2020Q4 branch, do
>    git tag -f last-2020Q4 origin/branches/2020Q4
> Then the next time through the 2020Q4 branch you can start at the 
> "last-2020Q4" tag.  For a new branch that doesn't have a tag yet, use 
> "git merge-base" to find the starting point.

Tags in my local repos won't get things upset?

If I lose the local repo....

-- 
  Dan Langille
  dan at langille.org