Next odd commit affecting `git subtree split` experiments with contrib/elftoolchain

Ulrich Spörlein uqs at freebsd.org
Thu Jun 18 08:44:39 UTC 2020


On Wed, Jun 17, 2020 at 6:47 PM Ed Maste <emaste at freebsd.org> wrote:

> On Wed, 17 Jun 2020 at 12:00, Ulrich Spörlein <uqs at freebsd.org> wrote:
> >
> > Running the subtree split, I get a history with about 437 commits. I see
> in your https://github.com/emaste/elftoolchain/tree/split-from-cgit-beta
> that you only end up with 277 commits (if that display is to be trusted).
>
> Are you using unmodified subtree split, from git port/pkg? The patch
> set from Tom Clarkson improves the detection of mainline vs subtree
> significantly. In the existing cgit-beta (without the MFH changes you
> discussed here) it produces a subtree with tens/hundreds of thousands
> of commits, because a mainline commit "leaks" into the subtree via a
> merge. The patched git subtree is what I used for the split
> elftoolchain that I shared.
>

Yes, this is using plain git subtree w/o patches, but it's on a repo that
has no MFH
head → project merges. As I wrote, it comes out with ~400 commits, while a
patched
subtree split will only produce about 280, so going with the patched
subtree seems
more sensible.


> > I'm not sure whether it would be straightforward to squash the right
> commits and keep
> > the ones with the proper commit message. Your repo still has a view MFH
> commits that
> > one might want to remove. Using git `filter-repo` might do the trick ...
>
> Indeed, although I'm not particularly concerned if there are a few
> stray MFH commits - it's a little bit of clutter but accurately
> represents what happened in that subtree in the svn world.
>

I tried filter-repo on your elftoolchain history, and it does nothing :(
There are still many tree objects that are identical between commits, but
they are usually merges, so it's not a simple "empty" commit that we can
toss
out.

However, I wonder if we cannot flatten the history to be a single line
instead of the merges.
Do merge commits make any sense for the resulting repo? If we could make it
all linear, then
the "empty" commits would stand out better and filter-repo could toss them
away.

Here's what I mean:

* |   e7db0cf7 0a4a2d68  - Add META_MODE support.
|\ \
| * \   3cc0c3e8 935facd3  - Merge sync of head
| |\ \
| * | | a16e339e d637a411  - Merge from head
| |\| |
* | | | eb3e8834 0a4a2d68  - elfcopy: Handle objects without a ".shstrtab"
section string table
...
| |/
|/|
* | b5199d77 d637a411  - Fix the conversion macro for .note sections,
broken in the cas

There's a long "branch" that was created from b5199d77, then sees 2 commits
(with different trees, mind you) a16e339e and 3cc0c3e8.
But once it is merged back into main with e7db0cf7, the resulting tree
is 0a4a2d68, which is just the same pre-merge (in commit eb3e8834).
So the whole exercise of merging in the off-shoot branch doesn't alter the
tree at all.

The algorithm would be simple:
- for all merge commits, check their tree vs. the tree of all parents.
  IFF one of the parents has the same tree as the merge commit
    → remove the merge commit

The last step is actually tricky, as you need to update all the children of
the merge commit with the new parent. You need to check all revisions for
that (in your repo, there are actually 2!)

Doing it manually with a graft:
% git replace --graft e7db0cf7 eb3e8834
results in
* | e7db0cf7 0a4a2d68  - (replaced) Add META_MODE support.
* | eb3e8834 0a4a2d68  - elfcopy: Handle objects without a ".shstrtab"
section string table
* | 666a6d09 7d6c3ca1  - Update to ELF Tool Chain r3223

This looks useful, I guess. There is a different case of history also,
where a branch is merged in multiple times, but the branch never had a
commit so far!

* | ab49ac72 2fcd31e1  - Copy elftoolchain readelf from vendor branch
|\|
* | b5f0dac9 00d9d606  - Correct elftoolchain strip(1) memory size
calculation
* | daa082dd 57cf3e02  - libelf: Fix cross-endian ELF note file / memory
conversion
* | c7ce7bba 41742fa1  - Track libarchive API change
* | 1481386a 62292f3c  - Temporarily disable non-FreeBSD NT_ note types
* | 2b9d3d05 d080f9c9  - Fix elftoolchain tools in-tree build
* | f4b35757 b87618f0  - Copy elftoolchain binutils replacements from
vendor branch
|\|
* | b5199d77 d637a411  - Fix the conversion macro for .note sections,
broken in ...
* | eda82609 cffe5c27  - GCC for PowerPC does not align  .note secti...
* | 0501d6e9 b7163de0  - Reapply r221569, r233401, r233524 and r255105: Add
support ...
* | 28d52963 195f7d2b  - Remove trailing whitespace.
* | 3342d176 e298487d  - * Allow API dwarf_loclist_n() and dwarf_loclist()
to be called with ...
* | 297fc330 e3796a34  - Add a sanity check: The provided offset for the
desired location ...
* | 8bd68bad 4613fb47  - API dwarf_attrval_flag() should properly handle an
attribute ...
* | 039bcdce 651ee9b1  - Fix typo: the public API dwarf_child() should
return ...
* | 605443d0 c431e1b8  - Fix a warning in libdwarf found by
-Wmissing-variable
* | 6458e5be f938195c  - Apply r241720 by ed:
* | fdaacf72 322e4f0d  - Use FreeBSD's ELF headers instead of the
elfdefinitions.h ...
* | afb9ed40 bcc80b4a  - Copy libelf, libdwarf and common files from
vendor/ to contrib/.
|/
* 5265ace0 13068447  - Initial import of elftoolchain r2974.

The branch on the right is merged in f4b35757 and ab49ac72, but it never
actually had a commit itself till then.
I don't know how to detect that, but let's graft it away anyway.

% git replace --graft f4b35757 b5199d77
% git replace --graft ab49ac72 b5f0dac9
% git filter-repo --debug --prune-empty=always --prune-degenerate=always
--force

This results in at most (!) 3 branches active at the same time, and that
only for 1 commit anyway. All MFHs are gone.
The graph output has the branches swapped, so a diff(1) is useless, you
need to manually look at them side-by-side

(I hope this will be readable, the hashes are tree-hashes, left is old,
right is after 3 grafts and filter-repo)

* | f5beebdd Update to ELF Tool Chain r3|* | f5beebdd Update to ELF Tool
Chain r34
|\|                                     ||\|

| * 5d850df4 Import ELF Tool Chain snaps|| * 5d850df4 Import ELF Tool Chain
snapsh
| * 4faefe71 Import ELF Tool Chain snaps|| * 4faefe71 Import ELF Tool Chain
snapsh
* | ce92a093 elfcopy: map all !alnum cha|* | ce92a093 elfcopy: map all
!alnum char
* |   e1c6f66f MFH                      |* | e1c6f66f elfcopy: fix symbol
table ha
|\ \                                    |* | fba12860 elfcopy: overhaul of
LMA han
| * | e1c6f66f elfcopy: fix symbol table|* | b6574d32 libelf: correct byte
count i
* | | fba12860 MFH                      |* | ecdda97c libdwarf: fix SHT_REL
reloca
|\| |                                   |* | f3d41e3c elfcopy: fail if
debug link
| * | fba12860 elfcopy: overhaul of LMA |* | 6c9c0b57 Allow elfcopy to
convert bet
* | | b6574d32 MFH                      |* | a6ca6b74 Update ELF Tool Chain
to ups
|\| |                                   ||\|

| * | b6574d32 libelf: correct byte coun|| * 3de9500b Import ELF Tool Chain
snapsh
| * | ecdda97c libdwarf: fix SHT_REL rel|* | 848f4fb5 readelf: decode
AArch64 TLS
* | | f3d41e3c MFH                      |* | bc4cfa9f readelf: report value
of unk
|\| |                                   |* | 75cc50b8 readelf: avoid
accidental fa
| * | f3d41e3c elfcopy: fail if debug li|* | 641e5321 Add config for RISC-V
ISA.
* | | 6c9c0b57 MFH                      |* | f4bd9e9f Fixed uninitialized
variable
|\| |                                   |* | 47b58c30 Update to ELF Tool
Chain r32
| * | 6c9c0b57 Allow elfcopy to convert ||\|

* | | a6ca6b74 MFH                      || * fa4048de Import ELF Tool Chain
snapsh
|\| |                                   |* | eff30dab elfcopy: include
extension b
| * | a6ca6b74 Update ELF Tool Chain to |* | 49fea608 elfcopy: exclude
extension w
| |\|                                   |* | e2028f7d readelf: add Xen ELF
notes
| | * 3de9500b Import ELF Tool Chain sna|* | e1809d24 Add missing commas

* | | 848f4fb5 MFH                      |* | 3d703d04 Add definitions for
MIPS TLS
|\| |                                   |* | 3f2daddd addr2line: initialize
die to
| * | 848f4fb5 readelf: decode AArch64 T|* | 1314d2c4 Update to ELF Tool
Chain r32
| * | bc4cfa9f readelf: report value of ||\|

| * | 75cc50b8 readelf: avoid accidental|| * 3204d66e Import ELF Tool Chain
snapsh
* | | 641e5321 MFH                      |* | f56b32a7 Rename ELFOSABI_SYSV
to ELFO
|\| |                                   |* | f1fa8947 readelf: Correct typo
HPUS -
| * | 641e5321 Add config for RISC-V ISA|* | 5cdc13da addr2line: skip CUs
lacking


> > Would be good if you could run a script against all contrib prefixes and
> later
> > count the number of commits that a contrib-tree produces to see if
> something
> > weird happens.
>
> You mean try running `git subtree split` on each contrib prefix, and
> checking that the number of commits in each generated tree is
> sensible? For example, inspect any subtree with over say 500 commits?
>
> As a first pass for identifying contrib prefixes I tried:
>
> ls -1d contrib/* sys/contrib/* crypto/* sys/crypto/* cddl/* sys/cddl/*
> sys/gnu/*
>
> sys/crypto/ and the cddl ones aren't quite right, and I still need to
> check for additional hierarchy (e.g., if we have cases like
> contrib/netbsd/blocklist instead of contrib/blocklist)
>

Yes, we need to check all of them and it might need a manual mapping (or at
least a handful exceptions).


>
> > You can test both parents whether they are reachable from
> vendor/elftoolchain/dist,
>
> I'm hoping to find an algorithm that could be made general and
> submitted upstream, so that we could have something like git subtree
> split --initial --prefix=contrib/elftoolchain, and have the --initial
> calculate the --onto revision automatically. If we produce some
> bespoke tooling for FreeBSD though this branch name approach should
> work, but I think we'd have to have a map of contrib directory to
> vendor branch. I believe that some are not the same in contrib and
> vendor.
>

It might really be too bespoke for FreeBSD and we only need to do all of
this once, yes?


> I think we have three ways we can address this:
>
> 1. Change the svn2git process so that we don't trip over unpatched
> git-subtree's issue with mainline history leaking into the subtree.
> 2. Get Tom Clarkson's git-subtree patches into upstream git, or
> require that contrib maintainers use our own patched git until that
> happens.
> 3. Develop and use an alternate subtree splitter.
>
> I suppose there is also
> 4. Reconsider git subtree altogether (e.g. submodules).
> but I think there's little appetite for this.
>
> At this point I think that option 2 is the most straightforward, and
> I'm now reasonably confident that it will work as we want. With this
> being the case I'd say we should focus on tuning svn2git to produce
> "sensible" output without regard to how unpatched git-subtree handles
> the output. That is, I'd say I'm broadly happy with the state of
> conversion in cgit-beta today.
>

2 would be my preference as well. So someone will need to run subtree split
for all contrib software and let me know if there are blockers that the
svn2git
conversion could help out with (or we need to help out all subtree splits
with a
dozen carefully placed grafts and a rewrite. Doesn't sound too bad?)

Cheers
Uli


More information about the freebsd-git mailing list