Re: llvm10 build failure on Rpi3

From: Mark Millard via freebsd-ports <freebsd-ports_at_freebsd.org>
Date: Sat, 03 Jul 2021 20:15:19 UTC

On 2021-Jul-3, at 11:25, bob prohaska <fbsd@www.zefox.net> wrote:

>>>> On 2021-Jul-2, at 19:23, Mark Millard <marklmi at yahoo.com> wrote:
>>> 
>>>>> Side note:
>>>>> 
>>>>> It llooks like http://www.zefox.org/~bob/swaplogs/poudrierellvm10.log
>>>>> shows that you tried with:
>>>>> 
>>>>> Device          1K-blocks     Used    Avail Capacity
>>>>> /dev/da0s2b       1048576    25784  1022792     2%
>>>>> /dev/mmcsd0s2b    1048576    25124  1023452     2%
>>>>> Total             2097152    50908  2046244     2%
>>>>> 
> [hope the quotes are right!]
> 
> That's correct. The sequence of experiments ran something like this:
> 
> The Pi3 was configured with a a pair of ~3 GB swap partitions, one on
> microSD, the other on the 1 TB mechanical hard disk. Make was not limited
> in the number of jobs it could parallel. OOMA was restrained by putting
> vm.pageout_oom_seq="4096"
> vm.pfault_oom_attempts="20"
> in /boot/loader.conf The usual "excessive swap" warnings were presented
> during boot and ignored by me. 
> 
> Worlds and kernels built wtihout trouble, so I tried building www/chromium
> using poudriere. It stopped in /devel/llvm10 with the "expected expression"
> error and continued to stop there despite updating /usr/ports several times. 
> At no time were there any hints of swap problems. Resorting to a GENERIC
> self-hosted kernel made no difference. /usr/src was not tampered with. 

So you still have not tried an artifacts or snapshot kernel+world?

> Eventually I resorted to running make in devel/llvm10, to my surprise it
> ran to completion.

Interesting.

Was this -j4? -j1? -j2? Any other interesting characteristics
for how it was run?

It would be interesting to see if building in a chroot
in that make style also worked (or a non-poudriere jail).

> It also ran make package successfully. Again I tried to
> build just devel/llvm10 using poudriere, again getting "expected expression". 
> 
> At that point I resized the swap partitions to 1 GB each and tried poudriere
> on devel/llvm10. That got rid of the excessive swap warnings, but didn't help.
> Finally I placed 
> MAKE_JOBS_NUMBER=2 
> in /usr/local/etc/poudriere.d/make.conf and tried again. That still failed,
> still with "expected expression". 

I'll note that the running build build shows Load Averages
of under 3. So the MAKE_JOBS_NUMBER=2 seems to be working.

> Since devel/llvm10 had created a package successfully, I tried slipping a copy
> into poudriere's package directory, hoping it would find and use the package
> to make further progress. Unfortunately, poudriere seems to remember the failure
> and won't use the proffered package. 

After things build correctly, things tend to look something like
(using an example):

2# ls -FTla /usr/local/poudriere/data/packages/main-CA53-default/
total 12
drwxr-xr-x  3 root  wheel  512 Jul  3 07:19:32 2021 ./
drwxr-xr-x  4 root  wheel  512 Jul  1 19:25:44 2021 ../
lrwxr-xr-x  1 root  wheel   18 Jun 28 04:32:43 2021 .buildname@ -> .latest/.buildname
lrwxr-xr-x  1 root  wheel   20 Jun 28 04:32:43 2021 .jailversion@ -> .latest/.jailversion
lrwxr-xr-x  1 root  wheel   16 Jul  3 07:19:32 2021 .latest@ -> .real_1625321972
drwxr-xr-x  4 root  wheel  512 Jul  3 07:19:32 2021 .real_1625321972/
lrwxr-xr-x  1 root  wheel   11 Jun 28 04:32:43 2021 All@ -> .latest/All
lrwxr-xr-x  1 root  wheel   14 Jun 28 04:32:43 2021 Latest@ -> .latest/Latest
lrwxr-xr-x  1 root  wheel   17 Jun 28 04:32:43 2021 meta.conf@ -> .latest/meta.conf
lrwxr-xr-x  1 root  wheel   16 Jun 28 04:32:43 2021 meta.txz@ -> .latest/meta.txz
lrwxr-xr-x  1 root  wheel   23 Jun 28 04:32:43 2021 packagesite.txz@ -> .latest/packagesite.txz

But, if a bulk is in process or has finished after some package
had a build failure, there is also a:

.building/

in there. That is what the message:

Using packages from previously failed build: ${PACKAGES}/.building

is about when starting poudriere bulk again. This is how
poudriere avoids rebuilding what successfully built --but
without adjusting the prior successful bulk build (if any).

So poudriere would have expected the file for devel/llvm10 's
build to be in that .building/ directory instead of down under
the .real_*/ directory.

(I've not checked if there is other record keeping in .building/
about the materials as well.)

Going in a different direction, one way to force a build to
start over after a failure is to: rm -fr PATH/.building
before starting a new bulk build. This might be appropriate
if one suspects a problem of a kind that did not stop a
build but produced something for a build that fails to operate
correctly.

> It's still running, on lang/spidermoneky78.  

So lang/rust finished. That is interesting because it includes an
llvm build internally.

Also: had you updated to pick up the workaround for the rust
build failures on aarch64? I doubt it because they were
commited on 2021-July-02. See,

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256864#c18

So that you did not get the process crash/core-dump during
lang/rust 's build is interesting.

> There were no reboots between experiments.
> 
> My first suspicion is that I've somehow screwed up the poudriere setup, perhaps
> by a fumbled execution of poudriere jail -u, which I mistakenly thought was
> needed after updating /usr/ports.

Again, poudriere does not control memory initialization in
the processes in the builders.

> The fact that the stoppage reported looks like
> a syntax error specific to devel/llmv10 which is unaffected by swap pressure
> makes it seem unrelated to kernel or swap constraints. 

The files with the syntax errors are ones generated by llvm-tblgen
during the build and it is the output of llvm-tblgen that is corrupt,
showing evidence of having used memory not initialized like it should
have been.

> AIUI, the hardware of the Pi4 is considerably different from the Pi3 in terms
> of memory management, noted from an interview with Eben Upton on YouTube.

Why would Eben Upton be talking about FreeBSD's memory management?

I suspect that the talk is not about what you think it is about,
but some narrower aspects than the overall memory managment.

> He 
> didn't go into any detail.  Whether that's relevant is unclear to me, but it 
> does suggest the Pi4, even with restricted memory, won't behave like a Pi3.

Various reserved memory areas and such will vary but FreeBSD
uses the same general memory management code, not completely
separate code.

> Is there any sort of sanity test for the poudriere system? If I delete and
> re-create the existing jail can the existing package library be preserved
> and re-used? If not, that's OK, I'd just like to know beforehand.
> 

# poudriere jail -jNAME -d
# poudriere jail -c -jNAME -m null -M /WORLDPATH -S /SRCPATH -v 14.0-CURRENT

should work fine. But really all that you are
doing is (using an example from my environment)
is deleting and rewriting a few very small files
in a directory with the jail's name:

# ls -FTla /usr/local/etc/poudriere.d/jails/main-CA53/
total 36
drwxr-xr-x  2 root  wheel  512 Jul  2 21:03:23 2021 ./
drwxr-xr-x  3 root  wheel  512 Jul  2 21:03:23 2021 ../
-rw-r--r--  1 root  wheel   14 Jul  2 21:03:23 2021 arch
-rw-r--r--  1 root  wheel    5 Jul  2 21:03:23 2021 method
-rw-r--r--  1 root  wheel   33 Jul  2 21:03:23 2021 mnt
-rw-r--r--  1 root  wheel    2 Jul  2 21:03:23 2021 pkgbase
-rw-r--r--  1 root  wheel   14 Jul  2 21:03:23 2021 srcpath
-rw-r--r--  1 root  wheel   11 Jul  2 21:03:23 2021 timestamp
-rw-r--r--  1 root  wheel   13 Jul  2 21:03:23 2021 version

# cat /usr/local/etc/poudriere.d/jails/main-CA53/arch
arm64.aarch64

# cat /usr/local/etc/poudriere.d/jails/main-CA53/method
null

# cat /usr/local/etc/poudriere.d/jails/main-CA53/mnt
/usr/obj/DESTDIRs/main-CA53-poud

# cat /usr/local/etc/poudriere.d/jails/main-CA53/pkgbase 
0

# cat /usr/local/etc/poudriere.d/jails/main-CA53/srcpath 
/usr/main-src

# cat /usr/local/etc/poudriere.d/jails/main-CA53/timestamp 
1625285003

# cat /usr/local/etc/poudriere.d/jails/main-CA53/version 
14.0-CURRENT

The deletion/replacement of timestamp may have rebuild
consequences from appearing to have changed (or just
being missing).

Nothing about any of those is going to change how memory
initialization is working in llvm-tblgen's operation
for generating any *GenGlobalISel.inc files, other than
if the timestamp forces some sort of rebuild from scratch
of some build dependencies first.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)