Re: Resolved: devel/llvm13 build: "ninja: build stopped: subcommand failed"

From: Mark Millard <marklmi_at_yahoo.com>
Date: Sun, 14 Aug 2022 16:35:23 UTC
On 2022-Aug-14, at 07:50, Nuno Teixeira <eduardo@freebsd.org> wrote:

Hello Mark,

> I use poudriere with USE_TMPFS=no, ofc because of low mem)
> The problem "ninja: build stopped: subcommand failed"

That is never the original error, just ninja reporting after
it observed an error that occurred, generally in another
process that is involved. A wide variety of errors will
end up with a "ninja: build stopped: subcommand failed"
notice as well.

The original error should be earlier in the log or on the
console ( or in /var/log/messages ). The "was killed: failed
to reclaim memory" is an example.

With 16 GiBytes of RAM you could have up to something like
60 GiByte of swap without FreeBSD complaining about being
potentially mistuned. (It would complain before 64 GiBytes
of SWAP.) 16+60 would be 76 GiBytes for RAM+SWAP.

I forgot to ask about UFS vs. ZFS being in use: which is in
use? (ZFS uses more RAM.)

> have some time now and it's caused by a build peak of memory that affects people with less than 32/64GB mem and to solve building it must be build using one builder with one core thats takes about 7 hours on my machine or with 6c+6t on 12.3 i386 that takes about 45min (123i386 is the only jail that I can use all cores).

Last I tried I built all the various devel/llvm* on a 8 GiByte
RPi4B, 4 builders active and ALLOW_MAKE_JOBS=yes in use.
4 FreeBSD cpus. So the load average would have been around 16+
much of the time during devel/llvm13 's builder activity.
USE_TMPFS=data in use.

Similarly for a 16 GiByte machine --but it is also an aarch64
context, also 4 FreebSD cpus.

But I use in /boot/loader.conf:

#
# Delay when persistent low free RAM leads to
# Out Of Memory killing of processes:
vm.pageout_oom_seq=120

This has been historically important to avoiding the likes of
"was killed: failed to reclaim memory" and related notices on
various armv7 and aarch64 small board computers used to
buildworld buildkernel and build ports, using all the cores.

The only amd64 system that I've access to has 32 FreeBSD cpus
and 128 GiBytes of RAM. Not a good basis for a comparison test
with your context. I've no i386 access at all.

> llvm 12 build without problems

Hmm. I'll try building devel/llvm13 on aarch64 with periodic
sampling of the memory use to see maximum observed figures
for SWAP and for various categories of RAM, as well as the
largest observed load averages.

ZFS context use. I could try UFS as well.

Swap: 30720Mi Total on the 8GiByte RPi4B.
So about 38 GiBytes RAM+SWAP available.
We should see how much SWAP is used.

Before starting poudriere, shortly after a reboot:

19296Ki MaxObs(Act+Lndry+SwapUsed)
(No SWAP in use at the time.)

# poudriere bulk -jmain-CA72-bulk_a -w devel/llvm13

for the from scratch build: reports:

[00:00:34] Building 91 packages using up to 4 builders

The ports tree is about a month back:

# ~/fbsd-based-on-what-commit.sh -C /usr/ports/
branch: main
merge-base: 872199326a916efbb4bf13c97bc1af910ba1482e
merge-base: CommitDate: 2022-07-14 01:26:04 +0000
872199326a91 (HEAD -> main, freebsd/main, freebsd/HEAD) devel/ruby-build: Update to 20220713
n589512 (--first-parent --count for merge-base)

But, if I gather right, the problem you see goes back
before that.

I can not tell how 4 FreeBSD cpus compares to the
count that the Lenovo Legion 5 gets.

I'll report on its maximum observed figures once the
build stops. It will be a while before the RPi4B
gets that far.

The ports built prior to devel/llvm13's builder starting
will lead to load averages over 4 from up to 4
builders, each potentially using up to around 4
processes. I'll see about starting a separate tracking
once devel/llvm13 's builder has started if I happen
to observe it at the right time frame for doing such.

> Cheers
> 
> Mark Millard <marklmi@yahoo.com> escreveu no dia domingo, 14/08/2022 à(s) 03:54:
> Nuno Teixeira <eduardo_at_freebsd.org> wrote on
> Date: Sat, 13 Aug 2022 16:52:09 UTC :
> 
> > . . .
> > I've tested it but it still fails:
> > ---
> > pid 64502 (c++), jid 7, uid 65534, was killed: failed to reclaim memory
> > swap_pager: out of swap space
> > ---
> > on a Lenovo Legion 5, 16GB RAM and 4GB swap.
> > . . .
> 
> This leaves various points unclear:
> 
> poudriere style build? Some other style?
> 
> (I'll state questions in a form generally for a poudriere style
> context. Some could be converted to analogous points for other
> build-styles.)
> 
> How many poudriere builders allowed (-JN) ?
> 
> /usr/local/etc/poudreire.conf :
> ALLOW_MAKE_JOBS=yes in use?
> ALLOW_MAKE_JOBS_PACKAGES=??? in use?
> USE_TMPFS=??? With what value? Anything other that "data" or "no"?
> 
> /usr/local/etc/poudriere.d/make.conf (or the like):
> MAKE_JOBS_NUMBER=??? in use? With what value?
> 
> Is tmpfs in use such that it will use RAM+SWAP when the
> used tmpfs space is large?
> 
> How much free space is available for /tmp ?
> 
> Are you using something like ( in, say, /boot/loader/conf ):

That should have been: /boot/loader.conf

Sorry.

> #
> # Delay when persistent low free RAM leads to
> # Out Of Memory killing of processes:
> vm.pageout_oom_seq=120
> 
> 
> How many FreeBSD cpus does a Lenovo Legion 5 present
> in the configuration used?
> 


===
Mark Millard
marklmi at yahoo.com