RPi4B and self-hosted buildworld buildkernel times: using more than -j3 is a waste in my tests.

Mark Millard marklmi at yahoo.com
Sat Aug 15 07:24:06 UTC 2020


Self hosted, from scratch, buildworld buildkernel times
(head -r363590 non-debug build, more context notes
later):

RPi4B set for 3072 MiByte context:

-j4 buildworld:  44783 sec (a little under 12.5 hours)
-j3 buildworld:  44034 sec (a little under 12.3 hours)
-j2 buildworld:  49070 sec (a little under 13.7 hours)
-j1 buildworld:  71083 sec (a little under 19.8 hours)

-j4 buildkernel:  2876 sec (a little under 48 minutes)
-j3 buildkernel:  2895 sec (a little under 49 minutes)
-j2 buildkernel:  3289 sec (a little under 55 minutes)
-j1 buildkernel:  4866 sec (a little under 82 minutes)

So: -j4 does not cut the time required compared to -j3. 
It appears that larger -jN figures would also not cut
the time compared to -j3.


Context notes:

Build commands had "buildworld buildkernel" on the
command lines.

UEFI/ACPI based boot (v1.17) for the RPi4B.

Each "buildworld buildkernel" was from-scratch and using
the same src.conf and make.conf files (under other names).

The file system is on a USB3 SSD and no sdcard is involved.
The context is limited to 3072 MiByte in order to avoid the
DMA handling problems that would otherwise happen.

over_voltage=6 and arm_freq=2000 were in use. This makes
the cortex-A72 clock rate match the MACCHIATObin Double
Shot that I have access to (2 GHz). The MACCHIATObin got:

-j4 buildworld:  18789 sec (a little under  5.3 hours)
-j1 buildworld:  54331 sec (a little under 15.1 hours)

-j4 buildkernel:  1296 sec (a little under 22    minutes)
-j1 buildkernel:  3800 sec (a little under 63.33 minutes)

So: much less time required compared to the RPi4B at the
same clock rate. (The MACCHIATObin has a SATA SSD but
buildworld buildkernel is not I/O bound.)

There are huge differences in the effectiveness of the
RAM caches and possibly other aspects related to RAM access.
I looked with a benchmark program that exposes some overall
effects of such variations, including allowing testing
various thread counts.

For the benchmarking, the range of problem sizes covered
by L1 & L2 cache, the RPi4B and MACCHIATObin were a close
match. But as problem sizes grew to much larger than the
caches, the difference became large, especially for the
likes -j4.

(An OverDrive 1000 with its cortex-a57 @1.7 GHz takes
even less time: again RAM caches and/or other aspects
related to RAM-access greatly contribute.)

For reference:

# more ~/src.configs/src.conf.cortexA72-clang-bootstrap.aarch64-host 
TO_TYPE=aarch64
#
KERNCONF=GENERIC-NODBG
TARGET=arm64
.if ${.MAKE.LEVEL} == 0
TARGET_ARCH=${TO_TYPE}
.export TARGET_ARCH
.endif
#
WITH_SYSTEM_COMPILER=
WITH_SYSTEM_LINKER=
#
WITH_LIBCPLUSPLUS=
WITHOUT_BINUTILS_BOOTSTRAP=
WITH_ELFTOOLCHAIN_BOOTSTRAP=
#Disables avoiding bootstrap: WITHOUT_LLVM_TARGET_ALL=
WITH_LLVM_TARGET_AARCH64=
WITH_LLVM_TARGET_ARM=
WITHOUT_LLVM_TARGET_MIPS=
WITHOUT_LLVM_TARGET_POWERPC=
WITHOUT_LLVM_TARGET_RISCV=
WITHOUT_LLVM_TARGET_X86=
WITH_CLANG=
WITH_CLANG_IS_CC=
WITH_CLANG_FULL=
WITH_CLANG_EXTRAS=
WITH_LLD=
WITH_LLD_IS_LD=
WITHOUT_BINUTILS=
WITH_LLDB=
#
WITH_BOOT=
WITHOUT_LIB32=
#
NO_WERROR=
#WERROR=
MALLOC_PRODUCTION=
#
# Avoid stripping but do not control host -g status as well:
DEBUG_FLAGS+=
#
WITH_REPRODUCIBLE_BUILD=
WITH_DEBUG_FILES=
#
# Use of the .clang 's here avoids
# interfering with other C<?>FLAGS
# usage, such as ?= usage.
CFLAGS.clang+= -mcpu=cortex-a72
CXXFLAGS.clang+= -mcpu=cortex-a72
CPPFLAGS.clang+= -mcpu=cortex-a72
ACFLAGS.arm64cpuid.S+= -mcpu=cortex-a72+crypto
ACFLAGS.aesv8-armx.S+= -mcpu=cortex-a72+crypto
ACFLAGS.ghashv8-armx.S+= -mcpu=cortex-a72+crypto

# more ~/src.configs/make.conf 
CFLAGS.gcc+= -v

(But gcc was not in use.)

# more /usr/src/sys/arm64/conf/GENERIC-NODBG
#
# GENERIC -- Custom configuration for the arm64/aarch64
#

include "GENERIC"

ident   GENERIC-NODBG

makeoptions     DEBUG=-g                # Build kernel with gdb(1) debug symbols

options         ALT_BREAK_TO_DEBUGGER

options         KDB                     # Enable kernel debugger support

# For minimum debugger support (stable branch) use:
#options        KDB_TRACE               # Print a stack trace for a panic
options         DDB                     # Enable the kernel debugger

# Extra stuff:
#options        VERBOSE_SYSINIT=0       # Enable verbose sysinit messages
#options        BOOTVERBOSE=1
#options        BOOTHOWTO=RB_VERBOSE
#options        KTR
#options        KTR_MASK=KTR_TRAP
##options       KTR_CPUMASK=0xF
#options        KTR_VERBOSE

# Disable any extra checking for. . .
nooptions       DEADLKRES               # Enable the deadlock resolver
nooptions       INVARIANTS              # Enable calls of extra sanity checking
nooptions       INVARIANT_SUPPORT       # Extra sanity checks of internal structures, required by INVARIANTS
nooptions       WITNESS                 # Enable checks to detect deadlocks and cycles
nooptions       WITNESS_SKIPSPIN        # Don't run witness on spinlocks for speed
nooptions       DIAGNOSTIC
nooptions       MALLOC_DEBUG_MAXZONES   # Separate malloc(9) zones
nooptions       BUF_TRACKING
nooptions       FULL_BUF_TRACKING

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)



More information about the freebsd-arm mailing list