[Bug 268760] lang/gcc12 build renders system unresponsive (12-core)

From: <bugzilla-noreply_at_freebsd.org>
Date: Thu, 05 Jan 2023 03:38:32 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=268760

            Bug ID: 268760
           Summary: lang/gcc12 build renders system unresponsive (12-core)
           Product: Ports & Packages
           Version: Latest
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: Individual Port(s)
          Assignee: salvadore@freebsd.org
          Reporter: pmc@citylink.dinoex.sub.org
             Flags: maintainer-feedback?(salvadore@freebsd.org)
          Assignee: salvadore@freebsd.org

hw.ncpu: 12
hw.physmem: 19427618816

last pid: 16312;  load averages: 105.17, 100.93, 76.14  up 0+02:23:05  21:18:12
218 processes: 144 running, 55 sleeping, 10 zombie, 9 lock
CPU:  0.0% user,  0.0% nice, 97.6% system,  2.4% interrupt,  0.0% idle
Mem: 9017M Active, 3139M Inact, 939M Laundry, 4536M Wired, 1412M Buf, 389M Free
ARC: 613M Total, 288M MFU, 125M MRU, 128K Anon, 3744K Header, 195M Other
     313M Compressed, 504M Uncompressed, 1.61:1 Ratio
Swap: 20G Total, 20G Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
16221 root          1  40    0  1434M  1134M *bufob   8   1:59  18.15% lto1
16223 root          1  44    0  1434M  1134M *bufob   7   2:34  17.24% lto1
16212 root          1  49    0  1413M  1024M CPU11   11   2:42  15.81% lto1
16266 root          1  41    0  1478M  1156M *bufob  11   2:14  14.60% lto1
16215 root          1  38    0  1413M  1023M *bufob   4   2:34  14.41% lto1
16194 root          1  47    0  1413M  1029M RUN      4   3:04  14.41% lto1
16191 root          1  41    0  1418M  1040M *bufob   5   2:26  14.18% lto1
16210 root          1  44    0  1413M  1024M *bufob   9   2:13  14.16% lto1
16179 root          1  41    0  1413M  1030M *bufob   2   2:50  14.14% lto1
16205 root          1  41    0  1413M  1023M *bufob   3   2:30  13.90% lto1
16224 root          1  36    0  1434M  1133M *bufob   6   2:23  13.86% lto1
16181 root          1  46    0  1413M  1031M *bufob  11   2:21  13.72% lto1
16257 root          1  41    0  1478M  1158M *bufob  10   2:38  13.68% lto1

# ps ax | grep lto | wc    ## and see the insane loadavg above!
     153    1070    8638

It is also necessary to twiddle the OOM to get this running at all:
  vm.pageout_oom_seq=5120
  vm.oom_pf_secs=180

In comparison, building on only 8 cores does not show this pathologic behaviour
(both instances were run simultaneously on the same platform):

hw.ncpu: 8
hw.physmem: 14239264768

last pid: 31114;  load averages:  9.94, 10.19, 10.00    up 0+02:46:40  21:42:02
73 processes:  9 running, 64 sleeping
CPU: 97.9% user,  0.0% nice,  1.6% system,  0.4% interrupt,  0.0% idle
Mem: 792M Active, 5544M Inact, 1675M Laundry, 3088M Wired, 1248M Buf, 2106M
Free
ARC: 859M Total, 529M MFU, 132M MRU, 128K Anon, 4725K Header, 193M Other
     558M Compressed, 1107M Uncompressed, 1.99:1 Ratio
Swap: 20G Total, 447M Used, 20G Free, 2% Inuse

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
31108 root          1 102    0   190M   154M CPU6     6   0:34 103.85% lto1
31097 root          1 103    0   241M   199M CPU5     5   0:50 100.30% lto1
31066 root          1 103    0   285M   246M CPU7     7   2:07 100.28% lto1
31091 root          1 103    0   206M   170M CPU0     0   1:03  99.91% lto1
31119 root          1  88    0   152M   115M RUN      1   0:08  98.78% lto1
31103 root          1 102    0   211M   173M CPU4     4   0:39  97.62% lto1
31086 root          1 102    0   290M   251M CPU3     3   1:04  97.25% lto1
31114 root          1 100    0   149M   112M CPU2     2   0:26  97.02% lto1
31109 root          1  20    0    14M  3396K CPU1     1   0:01   0.98% top
  638 ntpd          1  20    0    22M  1444K select   5   0:01   0.01% ntpd

# ps ax | grep lto | wc
       9      62     434

The build is:
    ports:      ba2be1158c403a13cab32b446555e8d8b59cbc57 (2023Q1)
    src: 12core 753d65a19a5541d532aff3cdd1089246fc5d2b8b (releng/13.1)
          8core d290399b6ce99c99cb1b9d1cdadc605172723561 (stable/13)

Over all compute usage (the build takes longer on 12-core and consumes almost
twice as much cycles):

ncpu 12
real 22911.77
user 158470.70
sys 141767.89
   2022732  maximum resident set size
     29106  average shared memory size
      1710  average unshared data size
       232  average unshared stack size
 275316289  page reclaims
    106509  page faults
         0  swaps
     24970  block input operations
   1587767  block output operations
    351720  messages sent
    461428  messages received
     11651  signals received
   3436058  voluntary context switches
   6330784  involuntary context switches

ncpu 8
real 18661.43
user 150203.79
sys 12814.11
   2082684  maximum resident set size
     28908  average shared memory size
      1797  average unshared data size
       267  average unshared stack size
 277207835  page reclaims
    118424  page faults
       112  swaps
     41921  block input operations
   1597336  block output operations
    348368  messages sent
    457959  messages received
     12200  signals received
   2049506  voluntary context switches
   5806408  involuntary context switches

Besides, I fail to understand what this activity might have to do with
bootstrapping.
(From the book: Bootstrapping a compiler means to recompile the compiler with
itself after having it built with the available system compiler. It is
necessary to ensure that the compiler is not malfunctioning, whenever the
available system compiler is a different version than the build target.)

Furthermore, I also think that now having to build 8 (eight!) different
compilers in the process of a rather simple desktop deploy is quite insane -
but that's not directly related to this issue.

-- 
You are receiving this mail because:
You are the assignee for the bug.