Re: Resolved: devel/llvm13 build: "ninja: build stopped: subcommand failed"

From: Mark Millard <marklmi_at_yahoo.com>
Date: Mon, 15 Aug 2022 01:40:49 UTC
On 2022-Aug-14, at 09:35, Mark Millard <marklmi@yahoo.com> wrote:

> On 2022-Aug-14, at 07:50, Nuno Teixeira <eduardo@freebsd.org> wrote:
> 
> . . .
>> have some time now and it's caused by a build peak of memory that affects people with less than 32/64GB mem and to solve building it must be build using one builder with one core thats takes about 7 hours on my machine or with 6c+6t on 12.3 i386 that takes about 45min (123i386 is the only jail that I can use all cores).
> 
> Last I tried I built all the various devel/llvm* on a 8 GiByte
> RPi4B, 4 builders active and ALLOW_MAKE_JOBS=yes in use.
> 4 FreeBSD cpus. So the load average would have been around 16+
> much of the time during devel/llvm13 's builder activity.
> USE_TMPFS=data in use.
> 
> Similarly for a 16 GiByte machine --but it is also an aarch64
> context, also 4 FreebSD cpus.
> 
> But I use in /boot/loader.conf:
> 
> #
> # Delay when persistent low free RAM leads to
> # Out Of Memory killing of processes:
> vm.pageout_oom_seq=120
> 
> This has been historically important to avoiding the likes of
> "was killed: failed to reclaim memory" and related notices on
> various armv7 and aarch64 small board computers used to
> buildworld buildkernel and build ports, using all the cores.
> 
> The only amd64 system that I've access to has 32 FreeBSD cpus
> and 128 GiBytes of RAM. Not a good basis for a comparison test
> with your context. I've no i386 access at all.
> 
>> llvm 12 build without problems
> 
> Hmm. I'll try building devel/llvm13 on aarch64 with periodic
> sampling of the memory use to see maximum observed figures
> for SWAP and for various categories of RAM, as well as the
> largest observed load averages.
> 
> ZFS context use. I could try UFS as well.
> 
> Swap: 30720Mi Total on the 8GiByte RPi4B.
> So about 38 GiBytes RAM+SWAP available.
> We should see how much SWAP is used.
> 
> Before starting poudriere, shortly after a reboot:
> 
> 19296Ki MaxObs(Act+Lndry+SwapUsed)
> (No SWAP in use at the time.)
> 
> # poudriere bulk -jmain-CA72-bulk_a -w devel/llvm13
> 
> for the from scratch build: reports:
> 
> [00:00:34] Building 91 packages using up to 4 builders
> 
> The ports tree is about a month back:
> 
> # ~/fbsd-based-on-what-commit.sh -C /usr/ports/
> branch: main
> merge-base: 872199326a916efbb4bf13c97bc1af910ba1482e
> merge-base: CommitDate: 2022-07-14 01:26:04 +0000
> 872199326a91 (HEAD -> main, freebsd/main, freebsd/HEAD) devel/ruby-build: Update to 20220713
> n589512 (--first-parent --count for merge-base)
> 
> But, if I gather right, the problem you see goes back
> before that.
> 
> I can not tell how 4 FreeBSD cpus compares to the
> count that the Lenovo Legion 5 gets.
> 
> I'll report on its maximum observed figures once the
> build stops. It will be a while before the RPi4B
> gets that far.
> 
> The ports built prior to devel/llvm13's builder starting
> will lead to load averages over 4 from up to 4
> builders, each potentially using up to around 4
> processes. I'll see about starting a separate tracking
> once devel/llvm13 's builder has started if I happen
> to observe it at the right time frame for doing such.
> 
> . . .

I actually have tried a few builds on different
machines. The 8GiByte RPi4B takes a long time and
is currently omitted from this report.


128 GiByte amd64 ThreadRipper 1950X (16 cores, so 32 FreeBSD cpus):
but using MAKE_JOBS_NUMBER=4 (with both FLANG and MLIR)

On amd64 I started a build with FLANG and MLIR enabled,
using MAKE_JOBS_NUMBER=4 in devel/llvm13/Makefile to
limit the build to 4 FreeBSD cpus. It is a ZFS context.
Given the 128 GiBytes of RAM, there will not be much
for effects of memory-pressure. But will record the 
MaxObs(Act+Lndry+SwapUsed) and the like.

---Begin OPTIONS List---
===> The following configuration options are available for llvm13-13.0.1_3:
     BE_AMDGPU=on: AMD GPU backend (required by mesa)
     BE_WASM=on: WebAssembly backend (required by firefox via wasi)
     CLANG=on: Build clang
     COMPILER_RT=on: Sanitizer libraries
     DOCS=on: Build and/or install documentation
     EXTRAS=on: Extra clang tools
     FLANG=on: Flang FORTRAN compiler
     GOLD=on: Build the LLVM Gold plugin for LTO
     LIT=on: Install lit and FileCheck test tools
     LLD=on: Install lld, the LLVM linker
     LLDB=on: Install lldb, the LLVM debugger
     MLIR=on: Multi-Level Intermediate Representation
     OPENMP=on: Install libomp, the LLVM OpenMP runtime library
     PYCLANG=on: Install python bindings to libclang
====> Options available for the single BACKENDS: you have to select exactly one of them
     BE_FREEBSD=off: Backends for FreeBSD architectures
     BE_NATIVE=off: Backend(s) for this architecture (X86)
     BE_STANDARD=on: All non-experimental backends
===> Use 'make config' to modify these settings
---End OPTIONS List---

[02:23:55] [01] [02:04:29] Finished devel/llvm13 | llvm13-13.0.1_3: Success

For just the devel/llvm13 builder activity, no parallel
builds and excluding the prerequisites being built:

load averages:   . . . MaxObs:   6.76,   4.75,   4.38

6812Mi MaxObs(Act+Lndry+SwapUsed) but no use of SWAP observed.

Note: MAKE_JOBS_NUMBER does not constrain any lld 
      procoess from using all available FreeBSD cpus
      (via threading) --and multiple lld's can be
      active at the same time.

So this looks to fit in a 16 GiByte RAM context just fine,
no SWAP needed.

I'll try MAKE_JOBS_NUMBER=12 instead and rerun on the same
machine.



128 GiByte amd64 ThreadRipper 1950X (16 cores, so 32 FreeBSD cpus):
but using MAKE_JOBS_NUMBER=12 (with both FLANG and MLIR)

---Begin OPTIONS List---
===> The following configuration options are available for llvm13-13.0.1_3:
     BE_AMDGPU=on: AMD GPU backend (required by mesa)
     BE_WASM=on: WebAssembly backend (required by firefox via wasi)
     CLANG=on: Build clang
     COMPILER_RT=on: Sanitizer libraries
     DOCS=on: Build and/or install documentation
     EXTRAS=on: Extra clang tools
     FLANG=on: Flang FORTRAN compiler
     GOLD=on: Build the LLVM Gold plugin for LTO
     LIT=on: Install lit and FileCheck test tools
     LLD=on: Install lld, the LLVM linker
     LLDB=on: Install lldb, the LLVM debugger
     MLIR=on: Multi-Level Intermediate Representation
     OPENMP=on: Install libomp, the LLVM OpenMP runtime library
     PYCLANG=on: Install python bindings to libclang
====> Options available for the single BACKENDS: you have to select exactly one of them
     BE_FREEBSD=off: Backends for FreeBSD architectures
     BE_NATIVE=off: Backend(s) for this architecture (X86)
     BE_STANDARD=on: All non-experimental backends
===> Use 'make config' to modify these settings
---End OPTIONS List---

[00:55:37] [01] [00:55:30] Finished devel/llvm13 | llvm13-13.0.1_3: Success

load averages:   . . . MaxObs:   12.45,  12.20,  11.52

13074Mi MaxObs(Act+Lndry+SwapUsed) but no use of SWAP observed.

Note: MAKE_JOBS_NUMBER does not constrain any lld 
      procoess from using all available FreeBSD cpus
      (via threading) --and multiple lld's can be
      active at the same time.

(16+4)*1024 Mi - 13074 Mi == 7406 Mi for other RAM+SWAP use.
(Crude estimates relative to your context.) That would seem
to be plenty.


Conclusion:

It is far from clear what all was contributing to your
(16+4)*1024 MiBytes proving to be insufficient.
Unintentional tmpfs use, such as a typo in USE_TMPFS
in /usr/local/etc/poudriere.conf ? I really have no
clue: the example is arbitrary.



Other notes:

# uname -apKU
FreeBSD amd64_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #50 main-n256584-5bc926af9fd1-dirty: Wed Jul  6 17:44:43 PDT 2022     root@amd64_ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-NODBG amd64 amd64 1400063 1400063

Note that the above is without WITNESS and without INVARIANTS and the like.

The only thing commited to main's contrib/llvm-project after that
was 9ef1127008 :

QUOTE
Apply tentative llvm fix for avoiding fma on PowerPC SPE
Merge llvm review D77558, by Justin Hibbits:

  PowerPC: Don't hoist float multiply + add to fused operation on SPE

  SPE doesn't have a fmadd instruction, so don't bother hoisting a
  multiply and add sequence to this, as it'd become just a library call.
  Hoisting happens too late for the CTR usability test to veto using the CTR
  in a loop, and results in an assert "Invalid PPC CTR loop!".
END QUOTE

===
Mark Millard
marklmi at yahoo.com