RPi4B and self-hosted buildworld buildkernel times: using more than -j3 is a waste in my tests.
Mark Millard
marklmi at yahoo.com
Sat Aug 15 07:24:06 UTC 2020
Self hosted, from scratch, buildworld buildkernel times
(head -r363590 non-debug build, more context notes
later):
RPi4B set for 3072 MiByte context:
-j4 buildworld: 44783 sec (a little under 12.5 hours)
-j3 buildworld: 44034 sec (a little under 12.3 hours)
-j2 buildworld: 49070 sec (a little under 13.7 hours)
-j1 buildworld: 71083 sec (a little under 19.8 hours)
-j4 buildkernel: 2876 sec (a little under 48 minutes)
-j3 buildkernel: 2895 sec (a little under 49 minutes)
-j2 buildkernel: 3289 sec (a little under 55 minutes)
-j1 buildkernel: 4866 sec (a little under 82 minutes)
So: -j4 does not cut the time required compared to -j3.
It appears that larger -jN figures would also not cut
the time compared to -j3.
Context notes:
Build commands had "buildworld buildkernel" on the
command lines.
UEFI/ACPI based boot (v1.17) for the RPi4B.
Each "buildworld buildkernel" was from-scratch and using
the same src.conf and make.conf files (under other names).
The file system is on a USB3 SSD and no sdcard is involved.
The context is limited to 3072 MiByte in order to avoid the
DMA handling problems that would otherwise happen.
over_voltage=6 and arm_freq=2000 were in use. This makes
the cortex-A72 clock rate match the MACCHIATObin Double
Shot that I have access to (2 GHz). The MACCHIATObin got:
-j4 buildworld: 18789 sec (a little under 5.3 hours)
-j1 buildworld: 54331 sec (a little under 15.1 hours)
-j4 buildkernel: 1296 sec (a little under 22 minutes)
-j1 buildkernel: 3800 sec (a little under 63.33 minutes)
So: much less time required compared to the RPi4B at the
same clock rate. (The MACCHIATObin has a SATA SSD but
buildworld buildkernel is not I/O bound.)
There are huge differences in the effectiveness of the
RAM caches and possibly other aspects related to RAM access.
I looked with a benchmark program that exposes some overall
effects of such variations, including allowing testing
various thread counts.
For the benchmarking, the range of problem sizes covered
by L1 & L2 cache, the RPi4B and MACCHIATObin were a close
match. But as problem sizes grew to much larger than the
caches, the difference became large, especially for the
likes -j4.
(An OverDrive 1000 with its cortex-a57 @1.7 GHz takes
even less time: again RAM caches and/or other aspects
related to RAM-access greatly contribute.)
For reference:
# more ~/src.configs/src.conf.cortexA72-clang-bootstrap.aarch64-host
TO_TYPE=aarch64
#
KERNCONF=GENERIC-NODBG
TARGET=arm64
.if ${.MAKE.LEVEL} == 0
TARGET_ARCH=${TO_TYPE}
.export TARGET_ARCH
.endif
#
WITH_SYSTEM_COMPILER=
WITH_SYSTEM_LINKER=
#
WITH_LIBCPLUSPLUS=
WITHOUT_BINUTILS_BOOTSTRAP=
WITH_ELFTOOLCHAIN_BOOTSTRAP=
#Disables avoiding bootstrap: WITHOUT_LLVM_TARGET_ALL=
WITH_LLVM_TARGET_AARCH64=
WITH_LLVM_TARGET_ARM=
WITHOUT_LLVM_TARGET_MIPS=
WITHOUT_LLVM_TARGET_POWERPC=
WITHOUT_LLVM_TARGET_RISCV=
WITHOUT_LLVM_TARGET_X86=
WITH_CLANG=
WITH_CLANG_IS_CC=
WITH_CLANG_FULL=
WITH_CLANG_EXTRAS=
WITH_LLD=
WITH_LLD_IS_LD=
WITHOUT_BINUTILS=
WITH_LLDB=
#
WITH_BOOT=
WITHOUT_LIB32=
#
NO_WERROR=
#WERROR=
MALLOC_PRODUCTION=
#
# Avoid stripping but do not control host -g status as well:
DEBUG_FLAGS+=
#
WITH_REPRODUCIBLE_BUILD=
WITH_DEBUG_FILES=
#
# Use of the .clang 's here avoids
# interfering with other C<?>FLAGS
# usage, such as ?= usage.
CFLAGS.clang+= -mcpu=cortex-a72
CXXFLAGS.clang+= -mcpu=cortex-a72
CPPFLAGS.clang+= -mcpu=cortex-a72
ACFLAGS.arm64cpuid.S+= -mcpu=cortex-a72+crypto
ACFLAGS.aesv8-armx.S+= -mcpu=cortex-a72+crypto
ACFLAGS.ghashv8-armx.S+= -mcpu=cortex-a72+crypto
# more ~/src.configs/make.conf
CFLAGS.gcc+= -v
(But gcc was not in use.)
# more /usr/src/sys/arm64/conf/GENERIC-NODBG
#
# GENERIC -- Custom configuration for the arm64/aarch64
#
include "GENERIC"
ident GENERIC-NODBG
makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols
options ALT_BREAK_TO_DEBUGGER
options KDB # Enable kernel debugger support
# For minimum debugger support (stable branch) use:
#options KDB_TRACE # Print a stack trace for a panic
options DDB # Enable the kernel debugger
# Extra stuff:
#options VERBOSE_SYSINIT=0 # Enable verbose sysinit messages
#options BOOTVERBOSE=1
#options BOOTHOWTO=RB_VERBOSE
#options KTR
#options KTR_MASK=KTR_TRAP
##options KTR_CPUMASK=0xF
#options KTR_VERBOSE
# Disable any extra checking for. . .
nooptions DEADLKRES # Enable the deadlock resolver
nooptions INVARIANTS # Enable calls of extra sanity checking
nooptions INVARIANT_SUPPORT # Extra sanity checks of internal structures, required by INVARIANTS
nooptions WITNESS # Enable checks to detect deadlocks and cycles
nooptions WITNESS_SKIPSPIN # Don't run witness on spinlocks for speed
nooptions DIAGNOSTIC
nooptions MALLOC_DEBUG_MAXZONES # Separate malloc(9) zones
nooptions BUF_TRACKING
nooptions FULL_BUF_TRACKING
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
More information about the freebsd-arm
mailing list