Re: armv8.2-A+ tuned FreeBSD kernels and buildworld buildkernel times: an example

From: Mark Millard <marklmi_at_yahoo.com>
Date: Sat, 29 Apr 2023 21:09:17 UTC
On Apr 29, 2023, at 12:16, Mark Millard <marklmi@yahoo.com> wrote:

> Context: all world's and kernel's involved/built are non-debug style.
> 
> Note: clang15 through LLVM main (so far) has errors in both directions
> for the features for cortex-a78c. So I also used +flagm+nofp16fml .
> (The cortex-x1c also has such problems, but the details are
> different.)
> 
> Notation in table below:
> CA72:  matching world or kernel had been built using -mcpu=cortex-a72
> CA78C: matching world or kernel had been built using -mcpu=cortex-a78C+flagm+nofp16fml
> 
> System: Windows Dev Kit 2023 (4 cortex-a78c's and 4 cortex-x1c's):
> (both: armv8.2-A with a few more modern features)
> 
> Times to build system from scratch (buildworld buildkernel from same
> sources) . . .
> 
> System running:                   World built in: kernel built in:
> CA72  kernel, CA72  world                6601 sec          597 sec
> CA78C kernel, CA78C world                4680 sec          413 sec
> CA78C kernel, CA72  world (chroot)       4715 sec          422 sec
> 
> The CA72/CA72 is from before I'd built the CA78C world and kernel.
> All builds used -j8 . None had competing activity on the machine.
> 
> What this suggests is having an explicitly armv8.2+ tuned kernel
> makes a notable difference for -j8 buildworld buildkernel times
> on aarch64.

"Tuned" here includes newer-feature use, so incompatible with the
likes of armv8.0-A hardware, for example. The FEAT_LSE atomics
use would be an example. But I've done nothing to investigate
subsetting the new-feature use to isolate what makes the biggest
contributions to the elapsed-time decrease.

> The Windows Dev Kit 2023 is the first (and only) armv8.1+ based
> system that I've have access to. So testing such properties is
> limited to the one context.
> 
> Also, I've not had access to the Windows Dev Kit 2023 for long:
> first experiments.
> 
> 
> Notes on my historically-usual aarch64 builds:
> 
> On cortex-a72 hardware, my context is -mcpu=cortex-a72 based. This
> once exposed a lack of sufficient synchronization in a palce in
> the USB subsystem. (Running the same system on cortex-a53 hardware
> did not fail. Running -mcpu=cortex-a53 based world+kernel on a
> cortex-a72 did not fail. A cortex-a53 hardware running the
> -mcpu=cortex-a53 based world+kernel did not fail.)
> 
> Until the hardware failed, there was a time when I also had
> access to a cortex-a57 FreeBSD system.
> 
> I do not do such -mcpu= tailoring on the only FreeBSD amd64 that
> I've access to, a ThreadRipper 1950X. I do such only for the lower
> end systems that I have access to. My aarch64 access is all to
> lower end, not upper end.


I should have reported that my recent activity for this is based
on: main-n262658-b347c2284603-dirty, b347c2284603 being from late
Apr 28, 2023 UTC. (The "-dirty" is from some historical patches
that I use.) Some of my activity has been from somewhat earlier
but I wanted to pick up another openzfs fix nor 2 that had
happened since then.)


===
Mark Millard
marklmi at yahoo.com