Re: llvm10 build failure on Rpi3

From: bob prohaska <fbsd_at_www.zefox.net>
Date: Thu, 24 Jun 2021 16:01:09 UTC
[What about trying a new kernel? details at end]
On Wed, Jun 23, 2021 at 11:02:02PM -0700, Mark Millard wrote:
> On 2021-Jun-23, at 21:30, bob prohaska <fbsd T www.zefox.net> wrote:
> 
> > On Wed, Jun 23, 2021 at 04:22:35PM -0700, Mark Millard wrote:
> >> On 2021-Jun-23, at 15:28, bob prohaska <fbsd at www.zefox.net> wrote:
> >> . . .
> > 
> >> 
> > [snipped for brevity]
> >> 
> >>>> For example, 0xA5u byte values might be the value that newly
> >>>> allocated memory is initialized to. Looking . . . man jemalloc
> >>>> (the memory allocator implementation used by FreeBSD) reports:
> >>>> 
> >>>>      opt.junk (const char *) r- [--enable-fill]
> >>>>          Junk filling. If set to ???alloc???, each byte of uninitialized
> >>>>          allocated memory will be initialized to 0xa5. If set to ???free???, all
> >>>>          deallocated memory will be initialized to 0x5a. If set to ???true???,
> >>>>          both allocated and deallocated memory will be initialized, and if
> >>>>          set to ???false???, junk filling be disabled entirely. This is intended
> >>>>          for debugging and will impact performance negatively. This option
> >>>>          is ???false??? by default unless --enable-debug is specified during
> >>>>          configuration, in which case it is ???true??? by default.
> >>>> 
> >>>> So, if you have junk filling enabled, I expect that you ran
> >>>> into a legitimate defect in the llvm-tblgen in use. Having
> >>>> Junk Filling disabled might be a workaround.
> >>>> 
> >>>> There is /etc/malloc.conf as a way of controlling the behavior:
> >>>> 
> >>>> ln -s 'junk:false' /usr/local/poudriere/poudriere-system/etc/malloc.conf
> >>>> 
> >>>> I suggest you retry building after getting the above in place.
> >>>> If it does not get the 0xA5A5A5A5u value, that would be
> >>>> more evidence of a uninitialized-memory defect in the llvm-tblgen
> >>>> involved.
> >>>> 
> >>> Done and running now. In the interim I tried building llvm10 using
> >>> make in /usr/ports, but it failed with another python conflict.
> >> 
> > The poudriere session just ended, with a somewhat different error:
> > 
> > In file included from /wrkdirs/usr/ports/devel/llvm10/work/llvm-10.0.1.src/lib/Target/AArch64/AArch64InstructionSelector
> > .cpp:312:
> > lib/Target/AArch64/AArch64GenGlobalISel.inc:1900:41: error: expected expression
> >        /*GIM_CheckRegBankForClass: @0*/, /*MI*/1, /*Op*/2, /*RC*//*AArch64::FPR64RegClassID: @0*/,
> >                                        ^
> > lib/Target/AArch64/AArch64GenGlobalISel.inc:1900:99: error: expected expression
> >        /*GIM_CheckRegBankForClass: @0*/, /*MI*/1, /*Op*/2, /*RC*//*AArch64::FPR64RegClassID: @0*/,
> >                                                                                                  ^
> > 2 errors generated.
> > [ 25% 1396/5364]
> > 
> > The last line is included as a fiducial indicator.  Two errors instead of
> > four, nothing about AMDGPU. 
> 
> You have a prior run that also showed only 2 errors:
> 
> http://www.zefox.org/~bob/poudriere/data/logs/bulk/main-default/2021-06-21_12h55m51s/logs/errors/llvm10-10.0.1_5.log
> 
> has:
> 
> lib/Target/AMDGPU/AMDGPUGenGlobalISel.inc:15822:50: error: expected expression
>         /*GIM_CheckRegBankForClass: @2779096485*/, /*MI*/0, /*Op*/0, /*RC*//*AMDGPU::VGPR_32RegClassID: @2779096485*/,
>                                                  ^
> lib/Target/AMDGPU/AMDGPUGenGlobalISel.inc:15822:118: error: expected expression
>         /*GIM_CheckRegBankForClass: @2779096485*/, /*MI*/0, /*Op*/0, /*RC*//*AMDGPU::VGPR_32RegClassID: @2779096485*/,
>                                                                                                                      ^
> 2 errors generated.
> 
> And a prior one that shows 6 errors but for AArch64 instead of AMDGPU:
> 
> http://www.zefox.org/~bob/poudriere/data/logs/bulk/main-default/2021-06-18_19h00m47s/logs/errors/llvm10-10.0.1_5.log
> 
> has:
> 
> lib/Target/AArch64/AArch64GenGlobalISel.inc:3760:50: error: expected expression
>         /*GIM_CheckRegBankForClass: @2779096485*/, /*MI*/1, /*Op*/1, /*RC*//*AArch64::FPR64RegClassID: @2779096485*/,
>                                                  ^
> lib/Target/AArch64/AArch64GenGlobalISel.inc:3760:117: error: expected expression
>         /*GIM_CheckRegBankForClass: @2779096485*/, /*MI*/1, /*Op*/1, /*RC*//*AArch64::FPR64RegClassID: @2779096485*/,
>                                                                                                                     ^
> lib/Target/AArch64/AArch64GenGlobalISel.inc:5735:50: error: expected expression
>         /*GIM_CheckRegBankForClass: @2779096485*/, /*MI*/0, /*Op*/1, /*RC*//*AArch64::GPR64RegClassID: @2779096485*/,
>                                                  ^
> lib/Target/AArch64/AArch64GenGlobalISel.inc:5735:117: error: expected expression
>         /*GIM_CheckRegBankForClass: @2779096485*/, /*MI*/0, /*Op*/1, /*RC*//*AArch64::GPR64RegClassID: @2779096485*/,
>                                                                                                                     ^
> lib/Target/AArch64/AArch64GenGlobalISel.inc:22981:50: error: expected expression
>         /*GIM_CheckRegBankForClass: @2779096485*/, /*MI*/0, /*Op*/1, /*RC*//*AArch64::GPR64spRegClassID: @2779096485*/,
>                                                  ^
> lib/Target/AArch64/AArch64GenGlobalISel.inc:22981:119: error: expected expression
>         /*GIM_CheckRegBankForClass: @2779096485*/, /*MI*/0, /*Op*/1, /*RC*//*AArch64::GPR64spRegClassID: @2779096485*/,
>                                                                                                                       ^
> 6 errors generated.
> ninja: build stopped: subcommand failed.
> *** Error code 1
> 
> It appears that the bug does not have reproducible details
> but all of the examples that do not have junk:false show
> @2779096485 . (And the only junk:false tried so far has @0
> instead.)
> 
> Something is providing and/or using initialized memory.
> 
> There is the possibility that swapping out and back in is
> sometimes not provides pages with the intended content.
> I state that as an example that we really can not claim
> to know that llvm-tblgen itself is doing something wrong.
> I'm not claiming to know what is actually happening. But
> such would fit with contexts that have more RAM that
> end up avoiding much of the paging/swapping also not
> seeing the problem.
> 
> But as in some past examples, you may have exposed a
> problem with FreeBSD.
> 
> >> Intersting. I'm unable to see a:
> >> 
> >> /usr/local/poudriere/poudriere-system/etc/malloc.conf
> >> 
> >> via what you have published. But I've no clue if such
> >> an odd symbolic link would be expected to show up.
> 
> Still true, but . . .
> 
> Well, now: http://www.zefox.org/~bob/poudriere/
> shows a: junk:false
> 
> Note that this is at the same level as poudriere-system/
> is shown. You might want to look and see if the file
> system shows such a file at that level as well.
> 
> This did not show up until after the build attempt had
> finished from what I can tell.
> 
> > The link seems visible to find and ls: 
> > root@www:/usr/local/poudriere # find . -name malloc.conf
> > ./poudriere-system/etc/malloc.conf
> > root@www:/usr/local/poudriere # more ./poudriere-system/etc/malloc.conf
> > ./poudriere-system/etc/malloc.conf: No such file or directory
> > root@www:/usr/local/poudriere # ls -l ./poudriere-system/etc/malloc.conf
> > lrwxr-xr-x  1 root  wheel  10 Jun 23 14:27 ./poudriere-system/etc/malloc.conf -> junk:false
> > root@www:/usr/local/poudriere # 
> > 
> > The link seems invisible to cat and more, reporting "No such file...."
> 
> The link is looking for a file called junk:false in the same
> directory. It is not expected to find such a file.
> 
> > I'm not sure what might be profitably tried next..... Suggestions welcome!
> 
> First off, if the point is to get the RPi3B+ going
> more than it is to get evidence about the problem,
> I'd suggest booting an RPi4B with the same media
> (adjusting config.txt as necessary) and trying the
> build from that boot. If it builds, the media can
> be moved back to the RPi3B+ for other activity.
> The failed vs. built status does give some
> information about the problem. Built would suggest
> that paging/swapping was involved in the problem.
> Failed might suggest otherwise. (I do not know
> if there would be much paging/sapping, depending on
> how much RAM the RPi4B had.)
> 
> One experiment would be to use the same boot media on
> an RPi4B but that had been told in config.txt to limit
> itself to 1 GiByte of RAM --and to also try with all
> the RAM being allowed. If the first fails but the
> second works, that is probably nice evidence. If both
> fail, that also is probably nice evidence. The other
> two combinations are less clear what any implications
> would be.
> 
> (I'm not claiming that you have such a RPi4B that can
> be made available for the duration of such experiments.)
> 
> Another direction is messy: testing under stable/13 and/or
> releng/13.0 vintages to see if it is somehow specific
> to main [so: 14], having an analogous context to what is
> known to fail under main (as much as reasonable). The
> RPi4B two-RAM-sizes comparison/contrast type of test could
> also be used.
> 
> There is also just repeating with junk:false a couple of
> times to see if there is evidence of variability like
> there is for without junk:false. Simplest of the
> suggested tests, but likely the least informative.
> 
> None of this would be likely to get close to a short,
> small test that shows the problem. I've no clue how
> to target that at this point.
> 
How about booting an older kernel so see if that makes a difference?

 ls -dl /boot/kernel* reports
drwxr-xr-x  2 root  wheel  13824 Jun 18 18:15 /boot/kernel
drwxr-xr-x  2 root  wheel  13312 Jan  9 15:57 /boot/kernel.main-c255664-g4d64c7243d26
drwxr-xr-x  2 root  wheel  13312 Aug 29  2020 /boot/kernel.mmccam
drwxr-xr-x  2 root  wheel  13824 Jun  9 18:52 /boot/kernel.old
drwxr-xr-x  2 root  wheel  13312 Aug 27  2020 /boot/kernel.r364346
drwxr-xr-x  2 root  wheel  13312 Aug 29  2020 /boot/kernel.r364895
drwxr-xr-x  2 root  wheel  13312 Sep  7  2020 /boot/kernel.r365355

Most of these are probably too old to work at all, but Jun 9 and Jan 9
might possibly work, I'd expect kernel.old to work as well. ISTR the
previous success building chromium was early 2021 or before. 

Thanks for reading, any suggestions appreciated!

bob prohaska