Re: [package - 130arm64-default][lang/gcc12-devel] Failed for gcc12-devel-12.0.1.s20220306_2 in build/runaway

From: Mark Millard <marklmi_at_yahoo.com>
Date: Sun, 27 Mar 2022 15:52:08 UTC
On 2022-Mar-26, at 16:15, Mark Millard <marklmi@yahoo.com> wrote:

> On 2022-Mar-26, at 14:37, Mark Millard <marklmi@yahoo.com> wrote:
> 
>> On 2022-Mar-26, at 13:16, Mark Millard <marklmi@yahoo.com> wrote:
>> 
>>> On 2022-Mar-26, at 12:35, Mark Millard <marklmi@yahoo.com> wrote:
>>> 
>>>> On 2022-Mar-26, at 07:26, Dimitry Andric <dim@FreeBSD.org> wrote:
>>>> 
>>>>> On 26 Mar 2022, at 15:16, pkg-fallout@freebsd.org <pkg-fallout@FreeBSD.org> wrote:
>>>>>> 
>>>>>> You are receiving this mail as a port that you maintain
>>>>>> is failing to build on the FreeBSD package build server.
>>>>>> Please investigate the failure and submit a PR to fix
>>>>>> build.
>>>>>> 
>>>>>> Maintainer:     toolchain@FreeBSD.org
>>>>>> Log URL:        http://ampere3.nyi.freebsd.org/data/130arm64-default/60ab72786154/logs/gcc12-devel-12.0.1.s20220306_2.log
>>>>>> Build URL:      http://ampere3.nyi.freebsd.org/build.html?mastername=130arm64-default&build=60ab72786154
>>>>> 
>>>>> So there isn't any actual error message in this log, except at the end:
>>>>> 
>>>>> ...
>>>>> =>> Cleaning up wrkdir
>>>>> ===>  Cleaning for gcc12-devel-12.0.1.s20220306_2
>>>>> Killed
>>>>> build of lang/gcc12-devel | gcc12-devel-12.0.1.s20220306_2 ended at Sat Mar 26 14:16:58 UTC 2022
>>>>> build time: 12:31:35
>>>>> !!! build failure encountered !!!
>>>>> 
>>>>> It looks like the last command being run before "Killed" is the cc1plus
>>>>> executable being linked with LTO, so I am assuming the build is killed
>>>>> due to an out-of-memory condition?
>>>>> 
>>>>> But this is only visible to people that have access to the machine the
>>>>> poudriere instance is running on. Can somebody with access please check?
>>>>> 
>>>> 
>>>> I do not have access but I've started a poudriere build
>>>> of my own on a HoneyComb. I've a patched top that monitors
>>>> and reports various Maximum Observed (MaxObs????) figures,
>>>> 64 GiBytes of RAM and slightly over 246 GiBytes of swap.
>>>> So hopefully it will report on about how big the memory use
>>>> gets. But it is allowed to use all 16 cores and there will
>>>> be no competing bulk builds using resources. So not a match
>>>> to the build server context.
>>>> 
>>>> Note: It is a ZFS context, so MaxObsWired is normally large
>>>> and shrinks over times where memory needs to be used for
>>>> other things. So the primary memory figures would be:
>>>> 
>>>> MaxObsSwapUsed (if any)
>>>> MaxObsActive
>>>> MaxObs(Act+Lndry+SwapUsed)
>>>> 
>>>> 
>>>> Side Note:
>>>> 
>>>> http://ampere3.nyi.freebsd.org/build.html?mastername=130arm64-default&build=60ab72786154
>>>> 
>>>> reports a Time of 11:48:41 but the log reports "build time: 12:31:35".
>>>> My guess is that processing the log file for extracting the type of
>>>> error makes some (much?) of the difference. (Type being runaway_process
>>>> in this case.)
>>>> 
>>>> 
>>> 
>>> I did just observe a cc1plus take somewhat over 30min
>>> of CPU time before completing and the lto1 related activity
>>> starting. It was under 5 GiBytes MaxObs(Act+Lndry+SwapUsed)
>>> [No swap use observed] before the lto1 related activity
>>> started.
>>> 
>>> For the lto1 related activity MaxObs(Act+Lndry+SwapUsed)
>>> has, so far, gotten up to around 12 GiBytes, still
>>> no swap use observed:
>>> 
>>> 12079Mi MaxObsActive
>>> 12278Mi MaxObs(Act+Lndry+SwapUsed)
>>> 
>>> I'll note that:
>>> 
>>> last pid: . . .;  load averages:  . . . MaxObs:  28.02,  16.88,  15.82
>>> . . . threads:   . . . running, . . . sleeping, 77 MaxObsRunning
>>> 
>>> So, on the timescale of the first load average, it does
>>> not always stay limited to the hardware threads available.
>>> 
>>> No process with sustained CPU activity sticks around across
>>> the lto1 activity. So I'll not be able to observe much about
>>> cpu time.
>>> 
>>> The elasped time doing lto1 activity has been going for a
>>> while but I'm unlikely to be able to observe its end happen.
>>> So I'll likely not have a good clue about that.
>>> 
>> 
>> Looks it spent about 1.5 or so hours on the particular block of
>> lto1 related activity. For reference, somewhat after that:
>> 
>> last pid: . . .;  load averages:  . . . MaxObs:  28.02,  17.04,  16.87
>> 
>> The 16 core are Cortex-A72's.
>> 
>> The following did not change (so far):
>> 
>> 12079Mi MaxObsActive
>> 12278Mi MaxObs(Act+Lndry+SwapUsed)
>> 
>> Still no observed swap use reported. 
>> 
>> 
>> The build is continuing. The build phase has been a little
>> over 2.5 hr so far.
>> 
> 
> I'll note that:
> 
> http://ampere1.nyi.freebsd.org/build.html?mastername=130arm64-default&build=d09582c49c48
> 
> reports for lang/gcc12-devel a build time of 21:53:01 .
> 
> (So lang/gcc12-devel has built after LTO was enabled.)
> 
> 
> Back to my lang/gcc12-devel build . . .
> 
> The earlier large block of lto1 activity was from 7 .usr/.local/bin/ld 's
> going in parallel. (I had a ps -auxd output in a file to look at.) Another
> block of 7 is going and taking significant time. (Seen via top.)
> 
> [Waiting for it to get past that block of lto1 related activity. . . .]
> 
> Hmm. I was not there to observe the end of the lto1 activity and what
> immediately followed. It is now in package:
> 
> [00:26:46] [01] [00:00:00] Building lang/gcc12-devel | gcc12-devel-12.0.1.s20220306_1
> . . .
> load: 1.36  cmd: sh 10124 [nanslp] 15151.01r 21.72u 70.93s 0% 4096k
> mi_switch+0x104 sleepq_catch_signals+0x41c sleepq_timedwait_sig+0x14 _sleep+0x1c0 kern_clock_nanosleep+0x1ac sys_nanosleep+0x3c do_el0_sync+0x5cc handle_el0_sync+0x40 
> [13_0R-CA72-default] [2022-03-26_11h31m18s] [parallel_build:] Queued: 19 Built: 18 Failed: 0  Skipped: 0  Ignored: 0  Fetched: 0  Tobuild: 1   Time: 04:12:31
> ID  TOTAL              ORIGIN   PKGNAME                          PHASE PHASE    TMPFS     CPU% MEM%
> [01] 03:45:51 lang/gcc12-devel | gcc12-devel-12.0.1.s20220306_1 package 00:08:12 32.00 KiB 100% 0.2%
> [04:12:37] Logs: /usr/local/poudriere/data/logs/bulk/13_0R-CA72-default/2022-03-26_11h31m18s
> 
> So somewhat over 3 hrs to reach the package stage.
> 
> The following have not changed since the prior reported figures:
> 
> last pid: . . .;  load averages:  . . . MaxObs:  28.02,  17.04,  16.87
> . . . threads:   . . . running, . . . sleeping, 77 MaxObsRunning
> and:
> 12079Mi MaxObsActive
> 12278Mi MaxObs(Act+Lndry+SwapUsed)
> (No report of observed swap usage.)
> 
> [Waiting for the "Finished" message from poudriere. . . .]
> 
> [04:33:13] [01] [04:06:27] Finished lang/gcc12-devel | gcc12-devel-12.0.1.s20220306_1: Success
> 

Just for reference: I tested the build time for reverting the
code in my context:

# git -C /usr/ports/ diff lang/gcc12-devel/Makefile
diff --git a/lang/gcc12-devel/Makefile b/lang/gcc12-devel/Makefile
index 644abf2cbb86..fab28c952f80 100644
--- a/lang/gcc12-devel/Makefile
+++ b/lang/gcc12-devel/Makefile
@@ -83,7 +83,7 @@ CONFIGURE_OUTSOURCE=  yes
 .if empty(PORT_OPTIONS:MBOOTSTRAP)
 CONFIGURE_ARGS+=--disable-bootstrap
 .else
-CONFIGURE_ARGS+=--with-build-config=bootstrap-lto-noplugin
+CONFIGURE_ARGS+=--with-build-config=bootstrap-debug
 ALL_TARGET=    bootstrap-lean
 .endif
 INSTALL_TARGET=        install-strip

The result was 01:38:45 for using bootstrap-debug:

[01:39:00] [01] [01:38:45] Finished lang/gcc12-devel | gcc12-devel-12.0.1.s20220306_1: Success

instead of the earlier 04:06:27 for using bootstrap-lto-noplugin :

[04:33:13] [01] [04:06:27] Finished lang/gcc12-devel | gcc12-devel-12.0.1.s20220306_1: Success


===
Mark Millard
marklmi at yahoo.com