Re: llvm10 build failure on Rpi3

From: Mark Millard via freebsd-ports <freebsd-ports_at_freebsd.org>
Date: Sun, 04 Jul 2021 00:43:51 UTC
On 2021-Jul-3, at 14:54, bob prohaska <fbsd at www.zefox.net> wrote:

> On Sat, Jul 03, 2021 at 01:15:19PM -0700, Mark Millard wrote:
>> 
>> 
>> 
>> So you still have not tried an artifacts or snapshot kernel+world?
>> 
> Not yet. 
> 
>>> Eventually I resorted to running make in devel/llvm10, to my surprise it
>>> ran to completion.
>> 
>> Interesting.
>> 
>> Was this -j4? -j1? -j2? Any other interesting characteristics
>> for how it was run?
>> 
> Nothing special was done. IIRC, it was make -DBATCH > make.log in
> the background. From top's screen it looked like -j4. 
> 
>> It would be interesting to see if building in a chroot
>> in that make style also worked (or a non-poudriere jail).
>> 
> 
> Can you point me to instructions for doing the experiment?

I'll deal with this is a separate reply.

>>> It also ran make package successfully. Again I tried to
>>> build just devel/llvm10 using poudriere, again getting "expected expression". 
>>> 
>>> At that point I resized the swap partitions to 1 GB each and tried poudriere
>>> on devel/llvm10. That got rid of the excessive swap warnings, but didn't help.
>>> Finally I placed 
>>> MAKE_JOBS_NUMBER=2 
>>> in /usr/local/etc/poudriere.d/make.conf and tried again. That still failed,
>>> still with "expected expression". 
>> 
>> I'll note that the running build build shows Load Averages
>> of under 3. So the MAKE_JOBS_NUMBER=2 seems to be working.
>> 
>>> Since devel/llvm10 had created a package successfully, I tried slipping a copy
>>> into poudriere's package directory, hoping it would find and use the package
>>> to make further progress. Unfortunately, poudriere seems to remember the failure
>>> and won't use the proffered package. 
>> 
> [large snip which convinced me to give up on tricking poudriere into
> using a package constructed by make] 
>> 
>> Going in a different direction, one way to force a build to
>> start over after a failure is to: rm -fr PATH/.building
>> before starting a new bulk build. This might be appropriate
> I'm missing something here: what does PATH represent? There's
> nothing called .building under /usr/local/poudriere, at least
> after the run finishes. 

Part of how this works is that .building/ is initially
populated with a shadow copy of the already existing
.latest/ mostly via use of hard links, with some top
level files actually copied.

If the status of the bulk run reaches stopped:done: then the
.building/ is mv'd (renamed) to be of the form .real_*/
with a new match for the * and then the links are adjusted
to point to the new .real_*/ and the old .real_*/ is
removed. In your context, this happens inside:

/usr/local/poudriere/data/packages/main-default/

So, yes, your run that reached stopped:done: no longer
has a .building/

By contrast, say you ^C the bulk run or that it reaches the
stopped:crashed: state instead of stopped:done: . Then the
.building/ would still be present, as would the pre-existing
existing .real_*/ and the links that use it. This is the
context for the next bulk run reporting:

"Using packages from previously failed build: ${PACKAGES}/.building"


>> if one suspects a problem of a kind that did not stop a
>> build but produced something for a build that fails to operate
>> correctly.
>> 
> Such as a corrupt llmv-tblgen?

Yep, possibly via it depending on something else that
has problems.

>> So lang/rust finished. That is interesting because it includes an
>> llvm build internally.
>> 
> 
> Does that build invoke the same llvm-tblgen?

Every devel/llvm* build builds its own llvm-tblgen .
lang/rust would build its own too. And the system
llvm support builds its own as well.

> [snip] 
>> Again, poudriere does not control memory initialization in
>> the processes in the builders.
>> 
> 
> For some reason I got the idea that whatever  asked for memory to use
> was responsible for initializing it.

Part of the point of having memory management libraries
have way to be told to fill-in things like 0xA5u bytes is
to get hints about contexts that end up with memory not
explicitly initialized by the requesting program.

Such is why I had you try the contrasting junk:false
case in /etc/malloc.conf . The results showed what the
memory allocation library initialized with instead of
something specific to the code requesting the allocation.

> Certainly not the kernel.....

The kernel fills in bytes into some user-space memory
as part of doing various requested operations. In such
cases it is potentially possible for the kernel to not
have filled-in the memory like it should have.

It is also possible for the kernel to replace the bytes
seen by user-space memory that it should not touch.
There is an example on-going issue with this for the
32-bit powerpc kernels that cover using old PowerMacs.

>>> The fact that the stoppage reported looks like
>>> a syntax error specific to devel/llmv10 which is unaffected by swap pressure
>>> makes it seem unrelated to kernel or swap constraints. 
>> 
>> The files with the syntax errors are ones generated by llvm-tblgen
>> during the build and it is the output of llvm-tblgen that is corrupt,
>> showing evidence of having used memory not initialized like it should
>> have been.
>> 
> 
> Wouldn't that point suspicion at llvm-tblgen, of whatever version
> LLVM is actually doing the work? 

It points at llvm-tblgen and/or something(s) that llvm-tblgen
depends on. Either way, the observed failure is from the
llvm-tblgen output being incorrect and later complained about.

devel/llvm10 builds its own llvm-tblgen for its own use. Each
devel/llvm* does. (As does the system's llvm*.)

There is also the variability in which llvm-tblgen output is
messed up: it is always some example of:

lib/Target/*/*GenGlobalISel.inc

but which value for the *'s tends to vary from build attempt
to build attempt. It suggests that some sort of race condition
is involved.

>>> AIUI, the hardware of the Pi4 is considerably different from the Pi3 in terms
>>> of memory management, noted from an interview with Eben Upton on YouTube.
>> 
>> Why would Eben Upton be talking about FreeBSD's memory management?
>> 
> He was talking about the Pi4 hardware and how it differed from the Pi3

Which is not memory management as such.

>> I suspect that the talk is not about what you think it is about,
>> but some narrower aspects than the overall memory managment.
>> 
> 
> I thought it had something to do with added DMA capablity. The video is at
> https://www.youtube.com/watch?v=hyj-7mTnumI
> In light of the discussion about llvm-tblgen I'm doubtful it's relevant,
> but it's not the worst way to waste an hour.
> 
>> 
>>> Is there any sort of sanity test for the poudriere system? If I delete and
>>> re-create the existing jail can the existing package library be preserved
>>> and re-used? If not, that's OK, I'd just like to know beforehand.
>>> 
>> 
>> # poudriere jail -jNAME -d
>> # poudriere jail -c -jNAME -m null -M /WORLDPATH -S /SRCPATH -v 14.0-CURRENT
>> 
>> should work fine. But really all that you are
>> doing is (using an example from my environment)
>> is deleting and rewriting a few very small files
>> in a directory with the jail's name:
>> 
> So, in my case /usr/local/poudriere/poudriere-system? 

After the delete would be:

poudriere jail -c -jNAME -m null -M /usr/local/poudriere/poudriere-system -S /usr/src -v 14.0-CURRENT

Same as in your: http://www.zefox.org/~bob/readme

> (using the nomenclature in your sample instructions).
> That would leave /usr/local/poudriere/data intact....

Yep. The delete does have an option (-C ???) for causing
more to be deleted under /usr/local/poudriere/data/ .

(Despite documentation claims otherwise, it did not
seem to delete packages when reqeuested.)

> I'm starting to understand why you think it unlikely
> to help.
> 
>> The deletion/replacement of timestamp may have rebuild
>> consequences from appearing to have changed (or just
>> being missing).
>> 
> If timestamps guide decisions on what to make and when,
> that might be significant. Not sure how I might've screwed
> them up, but in my hands anything is possible 8-)

I took a quick look and did not notice any timestamp
comparisons controlling anything.

>> Nothing about any of those is going to change how memory
>> initialization is working in llvm-tblgen's operation
>> for generating any *GenGlobalISel.inc files, other than
>> if the timestamp forces some sort of rebuild from scratch
>> of some build dependencies first.
>> 
> Maybe this should be obvious, but which llvm-tblgen is in 
> action? the one from the system, (12.0.1) or something
> else?
> 

devel/llvm10 builds its own llvm-tblgen and uses it.
Every devel/llvm* build builds its own llvm-tblgen .

Looking in the .log file for a build there are lines
containing commands that start out with (from my
example devel/llvm10 build context):

/wrkdirs/usr/ports/devel/llvm10/work/.build/bin/llvm-tblgen

Before any of those, there are commands associated with
building that bin/llvm-tblgen .

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)