OpenZFS branch tracking policy

Warner Losh imp at bsdimp.com
Sat Apr 10 19:22:50 UTC 2021


Thanks for the update Martin.

The tl;dr is I think this will be fine. However, I'd like to document the
reasoning here for future cases that we may need to judge. There's also a
couple of logistical issues at the end we need to address, one critical.

On Sat, Apr 10, 2021 at 11:15 AM Martin Matuska <mm at freebsd.org> wrote:

> Here are some of the facts:
>
> - In my merge, there are 15 conflicting files due to changes in FreeBSD
> (add/add)
> - Some of the changes have already been upstreamed in later revisions of
> openzfs than 891568c99
> - A significant majority of the diffs is subject for upstreaming. The
> ideal state would be to have all changes upstreamed. Sometimes changes get
> upstreamed with modifications.
> - In general our developers open pull requests and commit to OpenZFS, then
> we merge the changes
>
> What our developers would like is to use a "git blame" on
> sys/contrib/openzfs/something to see the history path from OpenZFS.
>
> I agree that the merge commits should be more verbose, ideally containing
> a "git log --oneline" of the commits since last merge.
>
> If a do a "squashed" merge like you described with bzip2, then I do not
> import the history from OpenZFS. That way we don't need that at all and can
> continue working the way we did until now.
>
> What you say about adding "unnecessary" history - since the common
> development at OpenZFS the majority of commits directly affects FreeBSD.
> Only "Linux-Only" and "CI-related" commits are not relevant for FreeBSD.
>
> I have updated my example branch how it may look like with more detailed
> commit messages, nicely clickable from github:
> https://github.com/mmatuska/freebsd-src/tree/openzfs_master_merged
>
> So the the current question is quite simple, we can do one of the
> following:
> a) do the unsquashed merge I suggest that imports the openzfs history -
> this will make the commits very transparent, future merges and upstream
> tracking very easy and --allow-unrelated-history flag is not required
> anymore. The "common" part of the histories in main and stable/13 will be
> identical.
> b) if that is not desired or we are undecided I will continue the way we
> go now until a better solution is found. In that case I will fork a second
> vendor branch (vendor/openzfs-2.1) that starts with the latest common
> commit of openzfs/master and openzfs/zfs-2.1-release and will merge (or
> cherry-pick?) from this branch directly to stable/13. As an alternative to
> merging, git cherry-pick supports -Xsubtree= as well.
>
I'm leading towards 'a', but that's a new way for the project to track
vendor changes. Many of my comments were on how to mirror pulling in
upstreams that we would want to do infrequently, and where we didn't care
about the details so much. llvm is a good example, as would be bzip, though
for different reasons. The former more due to the sheer size of the llvm
repo and the extremely infrequent need for users and developers of FreeBSD
to peer into the details. They simply are relevant for those cases. For
these cases, a squashed commit makes sense: people don't care about the
details and it keeps our repo size manageable and 'b' is appropriate. I had
initially thought OpenZFS would fall into this category, but your
additional details suggest that my initial thinking might be a poor fit to
our needs.

I think that you've made a compelling case to merge in the tree. The
potential downsides need to be looked at for doing something new. First is
size. From the numbers you provided, OpenZFS is on the larger side of
things we'd want to do this with. The expansion of the repo is concerning,
so there would need to be some benefit from that. Here, you've clearly
articulated the benefit: our OpenZFS developers drift back and forth
between OpenZFS and FreeBSD and do development in both places. If these
merges are frequent, this allows a more efficient workflow for OpenZFS
maintenance. This also allows better bisecting in the case of trouble. One
reason we don't generally want to open things up to merge commits is the
crazy merges we did with svn that created weird loops.  While the git
transition work endeavored to eliminate them, a number slipped through. We
do not want any more of them created. By that test, these commits pose no
risk given then OpenZFS practices (and little risk outside the
contrib/openzfs tree).

So, the practical aspects of this: how do we do this. We'll need to have
the OpenZFS mainline and branches in the tree, so the question of what
namespace to put them into comes to mind. The obvious answer would be
'openzfs' or 'vendor/openzfs' comes to mind, but you want two branches, so
maybe vendor/openzfs/main (or master, whatever it is called upstream) and
vendor/openzfs/<branch-name> would be better since we could then recommend
a 'refs' line for people working on openzfs that would let git do all the
heavy lifting here. There's no issue with having both vendor/openzfs and
vendor/openzfs/<foo> in the tree at the same time, I don't think. The
current rule sets would allow this, and you could carefully push both the
branches first. I don't think we need to do anything special except
document how to do the first commit (for others who need to do this) and
document how to update which I'm more than happy to help out with.

One critical thing we need to assess before you proceed, however: mail. We
need to make sure we're not about to send 7k emails as all these revisions
suddenly appear in the repo... While having an extra 7k revs in the repo
will be no problem, but 7k extra emails might raise a comment or two...

Comments?

Warner


> Best regards,
> mm
> On 10. 4. 2021 0:15, Warner Losh wrote:
>
>
>
> On Fri, Apr 2, 2021 at 6:44 PM Martin Matuska <mm at freebsd.org> wrote:
>
>> I have prepared an example merged branch here:
>> https://github.com/mmatuska/freebsd-src/tree/openzfs_master_merged
>>
>> The magical command was:
>> git merge -s subtree -Xsubtree="sys/contrib/openzfs" 891568c99
>> --allow-unrelated-histories
>>
>> Luckily, our current diff is manageable.
>>
>
> So I did this for bzip2 using approximately:
>
> git add remove bzip2 <url>
> git fetch bzip2
> git merge -s subtree -Xsubtree=contrib/bzip2 bzip2/master
> --allow-unrelated-histories --squash
>
> [1] At this point I resolved conflicts, where were the entire files since
> I guess I didn't bootstrap right to the last merge. There were 4 files in
> conflict.
>
> Then I did a git add of all the files in conflict and a git commit.
>
> This produced a good commit. since it was a squash commit, there were no
> issues.
>
> However, it turns out I botched the commit at point [1] above. So I ran
> this again and got a conflict for the whole file that I'd removed a blank
> line from.
>
> So, this looks like it could be workable, but does lead me to a few
> questions:
>
> (1) How do we do this so that the conflicts aren't add/add conflicts? Is
> there some way to bootstrap this?
> (2) Do we need to keep track of the last merge point and use that in
> merging the next one in?
> (3) I assume we keep track of FreeBSD diffs in a branch off <url> and we
> merge that instead of master.
> (4) What do we do about adjustments to the build that are needed?
> (5) Do we need to host a FreeBSD-specific repo with this stuff, maybe with
> tags we don't want widely pushed to ease the next merge? Eg, make this the
> first case of a 'vendor repo' that we then pull squash commits from so that
> the vendor repo can track upstream, but not otherwise be pushed to all our
> users....
>
> Finally, how did you deal with [1] producing so many full-file add/add
> conflicts? Oh, and what kind of commit message when things merge do you
> suggest? I rather like your 'bring in hash XXXX branch blah, here's the
> important highlights' emails and think that would be a good first cut at
> advice on what to put in these.
>
> This suggests the current answer is 'seems doable, but we need to document
> it and come up with recommendations for how to do it'.
>
> Warner
>
> On 3. 4. 2021 1:37, Martin Matuska wrote:
>> > Hi Warner and Ed,
>> >
>> > 2.1-release has already been branched. The stable branch policy in
>> > OpenZFS is somewhat strange, they make a staging branch for each
>> > patchlevel release, but the commits are continuous.
>> >
>> > To have some idea how big the repo history is:
>> >
>> > $ git rev-list master --count
>> > 6662
>> >
>> > $ git rev-list zfs-2.1-release --count
>> > 6650
>> >
>> > master and zfs-2.1-release have 6650 common commits at the  moment
>> >
>> > $ git log master | wc -l
>> > 129868
>> >
>> > (linecount - 4 * revcount) / revcount = linecount / revcount - 4 =
>> > 15,4938 comment lines per commit on average
>> >
>> > Initial commit was made in Feb 26, 2008.
>> >
>> > Yearly commit counts:
>> >
>> > $ git log master | grep -c -E '^Date:.* 2020 -[0-9]+$'
>> > 666
>> >
>> > $ git log master | grep -c -E '^Date:.* 2019 -[0-9]+$'
>> > 535
>> >
>> > $git log master | grep -c -E '^Date:.* 2018 -[0-9]+$'
>> > 428
>> >
>> > Martin
>> >
>> > On 2. 4. 2021 20:15, Warner Losh wrote:
>> >>
>> >>
>> >> On Fri, Apr 2, 2021 at 11:56 AM Ed Maste <emaste at freebsd.org
>> >> <mailto:emaste at freebsd.org>> wrote:
>> >>
>> >>     On Fri, 2 Apr 2021 at 11:50, Warner Losh <imp at bsdimp.com
>> >>     <mailto:imp at bsdimp.com>> wrote:
>> >>     >
>> >>     > We'd always hoped that we'd be able to do subtree merges from
>> >>     upstreams
>> >>     > that use git into FreeBSD. The big worry, though, was that this
>> >>     would
>> >>     > needless bloat the repo with a lot of history. We don't want,
>> >>     for example,
>> >>     > all of LLVM's history in the tree. We'd always anticipated that
>> >>     there'd be
>> >>     > some things we'd just accept the history for, since it is
>> >> similar in
>> >>     > character to the vendor branches (though of course a bit more).
>> >>
>> >>     Note that if we do want to avoid bringing in the full history `git
>> >>     subtree merge` supports a `--squash` option. This brings in the
>> >> set of
>> >>     upstream changes as a single commit, without bringing along the
>> >>     associated history. We will need to do more experimentation to
>> >> confirm
>> >>     that the full process, including bootstrapping, will work as we
>> >> want.
>> >>     Assuming this all works it should allow us to forgo the use of a
>> >>     FreeBSD-specific vendor branch in src.
>> >>
>> >>     We've discussed mirroring any such 3rd-party source in some
>> >>     FreeBSD-controlled repository. This would allow the project to
>> >> retain
>> >>     a full copy of the history, but avoid bloating src with it.
>> >>
>> >>     I agree with Warner that we may want a different policy (full
>> >> history
>> >>     or snapshots) for different contrib sources.
>> >>
>> >>
>> >> Good points Ed. I'd forgotten about --squash.
>> >>
>> >> Martin, what's your timeline for wanting to implement these things?
>> >> I'm unfamiliar with the OpenZFS schedules.
>> >>
>> >> Warner
>> > _______________________________________________
>> > freebsd-git at freebsd.org mailing list
>> > https://lists.freebsd.org/mailman/listinfo/freebsd-git
>> > To unsubscribe, send any mail to "freebsd-git-unsubscribe at freebsd.org"
>>
>


More information about the freebsd-git mailing list