Re: -CURRENT compilation time

From: Stefan Esser <se_at_freebsd.org>
Date: Wed, 08 Sep 2021 12:50:37 UTC
Am 08.09.21 um 10:57 schrieb David Chisnall:
> On 07/09/2021 18:02, Stefan Esser wrote:
>> Wouldn't this break META_MODE?
> 
> I have never managed to get META_MODE to work but my understanding is that
> META_MODE is addressing a problem that doesn't really exist in any other build
> system that I've used: that dependencies are not properly tracked.

META_MODE allows for complex interdependencies. They are no issue in the
GPL/Linux world, since components are not integrated in the same way as
has been practice in BSD for many decades.

> When I do a build of LLVM with the upstream build system with no changes, it
> takes Ninja approximately a tenth of a second to stat all of the relevant files
> and tell me that I have no work to do.  META_MODE apparently lets the FreeBSD
> build system extract these dependencies and do something similar, but it's not
> enabled by default and it's difficult to make work.

I tend to disagree on the last 5 words of your last sentence.

It took me just a few seconds to activate, and it has worked without fault
since.

There are only 2 trivial steps. But it is easy to miss the fact, that
WITH_META_MODE has to be added to /etc/src-env.conf, not /etc/src.conf:

1) Add "WITH_META_MODE=yes" to /etc/src-env.conf (create file, if it does
   not exist)

2) Add "device filemon" to your kernel configuration or to the kld_load
   variable in /etc/rc.conf to load the kernel module

(The kernel module can of course also be manually loaded at any time.)

>> I'd rather be able to continue building the world within a few minutes
>> (generally much less than 10 minutes, as long as there is no major LLVM
>> upgrade) than have a faster LLVM build and then a slower build of the world ...
> 
> The rest of this thread has determined that building LLVM accounts for half of
> the build time in a clean FreeBSD build.  LLVM's CMake is not a great example:
> it has been incrementally improved since CMake 2.8 and doesn't yet use any of
> the modern CMake features that allow encapsulating targets and providing import
> / export configurations.

The build of LLVM is skipped if META_MODE is enabled, except if there
really was a change to some LLVM header that causes a complete rebuild.

A further speed-up can be had with ccache, but I found that it does not
seem to make that much of a difference on my system.

> In spite of that, it generates a ninja file that compiles *significantly*
> faster than the bmake-based system in FreeBSD.  In other projects that I've
> worked on with a similar-sized codebase to FreeBSD that use CMake + Ninja, I've
> never had the same problems with build speed that I have with FreeBSD.

Possible, but if I watch the LLVM build with top or systat, I see that
all my cores are busy, nearly throughout the full build. There are two
methods that could theoretically speed-up the build:

1) make use of idle CPU cores

2) reduce the number of object files to build

I do not see that there is much potential for 1), since there is a high
degree of parallelism:

>>> World build completed on Wed Sep  1 13:40:14 CEST 2021
>>> World built in 99 seconds, ncpu: 32, make -j32
--------------------------------------------------------------
       98.69 real       741.61 user       234.55 sys

>>> World build completed on Thu Sep  2 23:22:04 CEST 2021
>>> World built in 98 seconds, ncpu: 32, make -j32
--------------------------------------------------------------
       98.34 real       780.41 user       228.67 sys

>>> World build completed on Fri Sep  3 19:09:39 CEST 2021
>>> World built in 165 seconds, ncpu: 32, make -j32
--------------------------------------------------------------
      164.84 real      1793.62 user       241.11 sys

>>> World build completed on Sun Sep  5 20:23:29 CEST 2021
>>> World built in 135 seconds, ncpu: 32, make -j32
--------------------------------------------------------------
      135.59 real       695.45 user       214.76 sys

>>> World build completed on Mon Sep  6 21:10:44 CEST 2021
>>> World built in 478 seconds, ncpu: 32, make -j32
--------------------------------------------------------------
      479.22 real     11374.40 user       474.19 sys

>>> World build completed on Wed Sep  8 11:51:03 CEST 2021
>>> World built in 652 seconds, ncpu: 32, make -j32
--------------------------------------------------------------
      652.14 real     17857.03 user       753.41 sys

Calculating "(user + sys) / real" I get factors between 10 (in case
of only minor changes) to 28 for larger recompiles (e.g. if lots
of source files depend on an updated header), with 32 the theoretical
limit for all cores continuously active during the build.

META_MODE does not understand that updated build tools do not always
require a full rebuild, but special cases have been added to the
Makefile to reduce the number of unnecessary rebuilds.

> Working on LLVM, I generally spend well under 10% of my time either waiting for
> builds or fighting the build system.  Working on FreeBSD, I generally spend
> over 90% of my time waiting for builds or fighting the build system.  This
> means that my productivity contributing to FreeBSD is almost zero.
> 
> For reference, changes to LLVM typically build for me in under 30 seconds with
> Ninja, unless I've changed a header that everything

You should get the same effect from META_MODE.

And META_MODE allows to execute make in any subtree of a larger source
tree, provided there is a Makefile for that part of the sources, e.g.:

$ cd /usr/src/usr.bin/clang/lld
$ make -j 32
[...]
$ touch /usr/src/contrib/llvm-project/lld/Common/Strings.cpp
$ time make -j 32
Building /usr/obj/usr/git/src/amd64.amd64/usr.bin/clang/lld/Common/Strings.o
Building /usr/obj/usr/git/src/amd64.amd64/usr.bin/clang/lld/ld.lld.full
Building /usr/obj/usr/git/src/amd64.amd64/usr.bin/clang/lld/ld.lld.debug
Building /usr/obj/usr/git/src/amd64.amd64/usr.bin/clang/lld/ld.lld

real	0m1.699s
user	0m1.454s
sys	0m3.745s

This assumes that your world is up-to-date in general, which should be
the case when working on a single component, e.g. one of the programs
belonging to CLANG/LLVM. Compiling a single file and linking a single
target does of course not allow for any parallelism.

$ cd /usr/src/usr.bin/clang
$ time make -j 32 > /dev/null
real	0m24.650s
user	8m58.381s
sys	0m34.786s

Some files have changed between the last "make buildworld" and now, but
it takes less than 10 minutes of CPU time (i.e. 1 minute real time on
a system with 6 cores / 12 threads) to update the obj directory. My
system got an overall parallelism of 24 on that run ...

And thereafter, there is hardly any overhead caused by bmake:

$ time make -j 32 > /dev/null
real	0m0.064s
user	0m0.453s
sys	0m0.029s

The overhead for scanning all LLVM components if there was no change is
absolutely negligible.

Now lets touch a header that is included in a number of files:

$ touch /usr/src/contrib/llvm-project/lld/include/lld/Common/Strings.h
$ time make -j 32 | wc
      41      82    3130

real	0m17.727s
user	2m56.021s
sys	0m8.237s

The ratio of user+sys to real is a little above 10, which is not too bad
if you consider that the link phase needs to wait for all object files.

I really do not see that there is much to gain, here ...

> In particular, building FreeBSD on a 10-24 core machine has very long periods
> where a number of the cores are completely idle.

I do not observe this, and I'm using a 16 core / 32 thread CPU, which
would be busy on all cores with just 16 parallel threads, just not
taking advantage of SMT. And I see an overall factor (and load average)
of 10 for small changes (with lots of I/O and little processing),
28 and beyond if larger parts of the system need to be rebuild.
(BTW: This is a system with SATA drives in a RAIDZ1 configuration,
which limits the I/O rate way below what an SSD based system might
get.)

> Ninja also has a few other nice features that improve performance relative to
> bmake:
> 
>  - It lets you put jobs in different pools.  In LLVM this is used to put link
> and compile jobs in different pools because linking with LLD uses multiple
> threads and a lot more memory than compilation, so a 10-core machine may want
> to do 12 compile jobs in parallel but only 2 link jobs.  This makes it much
> easier to completely saturate the machine.

That may be an advantage on systems with relatively small RAM (compared to
the number of threads supported in parallel).

But I do see compilations and linking executing in parallel, i.e. just one
linker activation while compiles run in parallel. Due to the multi-threaded
execution of LLD that might cause the load average to slightly exceed the
number of threads, but only for a relatively short period of time.

>  - Ninja provides each parallel build task with a separate pipe for stdout and
> stderr, and does not print their output unless a build step fails (or unless
> you build with -v).  With bmake, if a parallel build fails I have to rerun the
> build without -j, because the output is interleaved with succeeding jobs and
> it's difficult to see what actually failed.  With ninja, the output is from
> each failed job, with no interleaving.

This makes it easier to spot the error line, but I've got used to the way
the output is formatted by a parallel make run.

I always redirect the output into a file (either with nohup or with tee,
depending on whether I plan to watch the build). I hardly ever have to use
a single threaded build to spot an error, it is generally obvious from the
build log in that file.

But I do agree that separate output per build job makes this easier, just
not enough that I'd want to rework the whole build system. And especially
not if it breaks META_MODE.

Regards, STefan