Re: 13.2 BETA2: how do debug META_MODE?

From: Peter <pmc_at_citylink.dinoex.sub.org>
Date: Tue, 21 Feb 2023 12:55:33 UTC
On Mon, Feb 20, 2023 at 08:44:59PM -0800, Mark Millard wrote:
! Peter <pmc_at_citylink.dinoex.sub.org> wrote on
! Date: Tue, 21 Feb 2023 03:45:12 UTC :
! 
! > on /some/ of my nodes, META_MODE seems not being honored anymore:
! > I had to build them another time, and the lengthy lib/clang gets
! > built all over again (tried two times).
! > This is so since 13.2 (BETA2). It did work in 13.1 (RELENG), at least
! > according to the timing from the logfiles. 
! > 
! > Now I'm trying to figure out the difference, because I have some
! > nodes where it appears to more-or-less work (have seen buildworld
! > take 5 minutes), and others where it doesn't (take an hour to build).
! > The thing is scripted, so it is not so very likely an operator error
! > (while not impossible either).
! > 
! > But it seems difficult to figure out details: "make -n" seems to not
! > care about META_MODE, while META_MODE suppresses all useful output from
! > make. And the docs say there are *.meta files (yes there are), but no
! > info about how to verify their content, or how to get make tell what
! > it is going to do and why (and the buildworld is not the most easy
! > to understand target)...
! > 
! > So, some inspiration would be welcome...
! 
! On thing to check on is if filemon.ko is loaded and operational.
! META_MODE greatly depends on it.

That should be the case - 'kldstat' shows it (and I've seen warnings
where it didn't).

! Another thing to know is that the following are very different
! for what all is built  for the "(again #0)" line vs. the other
! two "again" lines, using buildworld as an example context.
! Imagine here the the first buildworld rebuilds llvm/clang
! materials.
! 
! # cd /usr/src/
! # env WITH_META_MODE=yes make buildworld
! # env WITH_META_MODE=yes make installworld
! # env WITH_META_MODE=yes make buildworld (again #0)
! ## no more rebuilds below?
! # env WITH_META_MODE=yes make buildworld (again #1)
! # env WITH_META_MODE=yes make buildworld (again #2)

But what is the difference between #0 and #1?
I am actually grabbing src and obj and zfs-send them away
to the actual install target, then installworld to DESTDIR
which becomes the new root for the next update cycle.
Jails are a bit different because not all of them get the
compiler installed - and jails do have the problem now: if
they get the compiler, it always rebuilt.

According to the ./UPDATING paper, one should do build + install
+ build-again + install-again. That is probably outdated, but on
Release-Changes I'm doing it that way.
During the build-again step I noticed that the META_MODE did already
work, things were fast - but only for the base nodes, not for the
jails.

! Unfortunately, the some of the install activity registers
! as activity that is to cause later rebuild activity:
! updated file dates.

Okay. I tried to find the offensive timestamps, and the problem
is: there aren't any. :(
I grabbed one node that does build fast as expected, and one that
doesn't, and put the single-core output of buildworld side-by-side.
and this is where the difference starts:

[fast builds]
===> kerberos5/tools/make-roken (obj,all,install)
Building /usr/obj/usr/src/amd64.amd64/tmp/obj-tools/kerberos5/tools/make-roken/make-roken.c
Building /usr/obj/usr/src/amd64.amd64/tmp/obj-tools/kerberos5/tools/make-roken/make-roken.o
Building /usr/obj/usr/src/amd64.amd64/tmp/obj-tools/kerberos5/tools/make-roken/make-roken
Building /usr/obj/usr/src/amd64.amd64/tmp/obj-tools/kerberos5/tools/make-roken/_proginstall
===> kerberos5/lib/libroken (obj,all,install)
Building /usr/obj/usr/src/amd64.amd64/tmp/obj-tools/kerberos5/lib/libroken/libroken.a
building static roken library
Building /usr/obj/usr/src/amd64.amd64/tmp/obj-tools/kerberos5/lib/libroken/_libinstall
Building /usr/obj/usr/src/amd64.amd64/tmp/obj-tools/kerberos5/lib/libroken/_INCSINS
===> kerberos5/lib/libvers (obj,all,install)
Building /usr/obj/usr/src/amd64.amd64/tmp/obj-tools/kerberos5/lib/libvers/libvers.a
building static vers library

[rebuilds it all]
===> kerberos5/tools/make-roken (obj,all,install)
Building /usr/obj/usr/src/amd64.amd64/tmp/obj-tools/kerberos5/tools/make-roken/make-roken.c
Building /usr/obj/usr/src/amd64.amd64/tmp/obj-tools/kerberos5/tools/make-roken/make-roken.o
Building /usr/obj/usr/src/amd64.amd64/tmp/obj-tools/kerberos5/tools/make-roken/make-roken
Building /usr/obj/usr/src/amd64.amd64/tmp/obj-tools/kerberos5/tools/make-roken/_proginstall
===> kerberos5/lib/libroken (obj,all,install)
Building /usr/obj/usr/src/amd64.amd64/tmp/obj-tools/kerberos5/lib/libroken/roken.h
Building /usr/obj/usr/src/amd64.amd64/tmp/obj-tools/kerberos5/lib/libroken/base64.o
Building /usr/obj/usr/src/amd64.amd64/tmp/obj-tools/kerberos5/lib/libroken/copyhostent.o
Building /usr/obj/usr/src/amd64.amd64/tmp/obj-tools/kerberos5/lib/libroken/ecalloc.o
Building /usr/obj/usr/src/amd64.amd64/tmp/obj-tools/kerberos5/lib/libroken/emalloc.o
...

Looking at the first concerned file,
/usr/obj/usr/src/amd64.amd64/tmp/obj-tools/kerberos5/lib/libroken/roken.h
and it's *.meta file, the timestamps aren't really different.
In fact, those from the *.meta file are all in the future - the
question seems to become rather: why did that ever work? ;)

# Meta data file /usr/obj/usr/src/amd64.amd64/tmp/obj-tools/kerberos5/lib/libroken/roken.h.meta
CMD make-roken > roken.h
CWD /usr/obj/usr/src/amd64.amd64/tmp/obj-tools/kerberos5/lib/libroken
TARGET roken.h
-- command output --

-- filemon acquired metadata --
# filemon version 5
# Target pid 1754
# Start 1676683461.944305
V 5
E 1757 /bin/sh
R 1757 /etc/libmap.conf
R 1757 /var/run/ld-elf.so.hints
R 1757 /lib/libedit.so.8
R 1757 /lib/libc.so.7
R 1757 /lib/libncursesw.so.9
R 1757 /usr/share/locale/en_US.UTF-8/LC_CTYPE
F 1757 1758
W 1758 roken.h
E 1758 /usr/obj/usr/src/amd64.amd64/tmp/legacy/usr/sbin/make-roken
X 1758 0 0
X 1757 0 0
# Stop 1676683461.954451
# Bye bye

The /usr/obj/usr/src/amd64.amd64/tmp/legacy/usr/sbin/make-roken in
this output is a different one than what was built right beforehand
(in fact there are three or four of them).

! There are also issues of sort of a feedback loop: rm
! ends up updated (deleted and replaced) by install but rm
! was also listed as part of the sequence of replacing some
! other files. Result? The rm removal/replacement ends up
! meaning the files are to be regenerated, not just recopied.
! There is a long list of such commands, not just rm.
! 
! "again #0" will rebuild llvm/clang. The other two "again"s
! will not.

Strangely, it didn't, up to now. And it still doesn't on the
base nodes.

And as I am just juggling around with zfs clones, and fixup the
src timestamps from the commitlogs anyway, I could probably fixup this
also - if I only get to what is actually happening.

! See:
! 
! https://lists.freebsd.org/pipermail/freebsd-current/2021-January/078488.html
! 
! and:
! 
! https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257616 .

Thank You, that's exactly the inspiration I was looking for!
Diving back in...