RPi4B buildworld buildkernel times for already installed system being -mcpu=cortex-a72 vs. -mcpu=cortex-a53 based
Mark Millard
marklmi at yahoo.com
Mon Sep 28 15:36:48 UTC 2020
[Turns out, when sdram_freq_min=3200 is effective, -j4 builds are faster
than -j3 builds by about an hour (holding other configuration conditions
constant).]
On 2020-Sep-27, at 11:07, Mark Millard <marklmi at yahoo.com> wrote:
> On 2020-Sep-20, at 18:40, Mark Millard <marklmi at yahoo.com> wrote:
>
>> On 2020-Sep-20, at 18:32, Mark Millard <marklmi at yahoo.com> wrote:
>>
>>> The following are from scratch buildworld buildkernel rebuilds
>>> on a RPi4B (head -r363590 context).
>>>
>>> ENVIRONMENT: -mcpu=cortex-a72 based world and kernel running already, RPi4B @ 2G Hz,
>>> Restricted to 3 GiByte RAM, -j3:
>>>
>>> World built in 37469 seconds, ncpu: 4, make -j3
>>> Kernel(s) GENERIC-NODBG built in 2474 seconds, ncpu: 4, make -j3
>>>
>>> ENVIRONMENT: -mcpu=cortex-a53 based kernel running, RPi4B @ 2G Hz,
>>> Restricted to 3 GiByte RAM, -j3:
>>>
>>> World built in 44034 seconds, ncpu: 4, make -j3
>>> Kernel(s) GENERIC-NODBG built in 2895 seconds, ncpu: 4, make -j3
>>>
>>> So a little under 11.1 hr total vs. a little over 13.0 hr total,
>>> a somewhat over 50 min improvement.
>>
>> "a somewhat over 1hr 50 min improvement" is what I should have
>> managed to type.
>>
>>> (A xhci patch finally allowed me to boot -mcpu=cortex-a72
>>> based kernel builds on the RPi4B: The xhci event ring
>>> initialization code was missing a usb_bus_mem_flush_all
>>> call previously.)
>>>
>>>
>>> Supporting details:
>>>
>>> (e-mail based spacing changes expected below)
>>>
>>> # diff -u ~/src.configs/src.conf.cortexA72-clang-bootstrap.aarch64-host ~/src.configs/src.conf.cortexA53-clang-bootstrap.aarch64-host
>>> --- /root/src.configs/src.conf.cortexA72-clang-bootstrap.aarch64-host 2020-03-13 22:29:25.470155000 -0700
>>> +++ /root/src.configs/src.conf.cortexA53-clang-bootstrap.aarch64-host 2020-03-13 22:29:25.469455000 -0700
>>> @@ -49,9 +49,9 @@
>>> # Use of the .clang 's here avoids
>>> # interfering with other C<?>FLAGS
>>> # usage, such as ?= usage.
>>> -CFLAGS.clang+= -mcpu=cortex-a72
>>> -CXXFLAGS.clang+= -mcpu=cortex-a72
>>> -CPPFLAGS.clang+= -mcpu=cortex-a72
>>> -ACFLAGS.arm64cpuid.S+= -mcpu=cortex-a72+crypto
>>> -ACFLAGS.aesv8-armx.S+= -mcpu=cortex-a72+crypto
>>> -ACFLAGS.ghashv8-armx.S+= -mcpu=cortex-a72+crypto
>>> +CFLAGS.clang+= -mcpu=cortex-a53
>>> +CXXFLAGS.clang+= -mcpu=cortex-a53
>>> +CPPFLAGS.clang+= -mcpu=cortex-a53
>>> +ACFLAGS.arm64cpuid.S+= -mcpu=cortex-a53+crypto
>>> +ACFLAGS.aesv8-armx.S+= -mcpu=cortex-a53+crypto
>>> +ACFLAGS.ghashv8-armx.S+= -mcpu=cortex-a53+crypto
>>>
>>>
>>> The .amd64-host files are similar for doing cross builds.
>>>
>>> I also use += in secure/lib/libcrypto/Makefile :
>>>
>>> # svnlite diff /usr/src/secure/lib/libcrypto/Makefile
>>> Index: /usr/src/secure/lib/libcrypto/Makefile
>>> ===================================================================
>>> --- /usr/src/secure/lib/libcrypto/Makefile (revision 365919)
>>> +++ /usr/src/secure/lib/libcrypto/Makefile (working copy)
>>> @@ -20,7 +20,7 @@
>>> SRCS+= o_str.c o_time.c threads_pthread.c uid.c
>>> .if defined(ASM_aarch64)
>>> SRCS+= arm64cpuid.S armcap.c
>>> -ACFLAGS.arm64cpuid.S= -march=armv8-a+crypto
>>> +ACFLAGS.arm64cpuid.S+= -march=armv8-a+crypto
>>> .elif defined(ASM_amd64)
>>> SRCS+= x86_64cpuid.S
>>> .elif defined(ASM_arm)
>>> @@ -35,7 +35,7 @@
>>> SRCS+= aes_cbc.c aes_cfb.c aes_ecb.c aes_ige.c aes_misc.c aes_ofb.c aes_wrap.c
>>> .if defined(ASM_aarch64)
>>> SRCS+= aes_core.c aesv8-armx.S vpaes-armv8.S
>>> -ACFLAGS.aesv8-armx.S= -march=armv8-a+crypto
>>> +ACFLAGS.aesv8-armx.S+= -march=armv8-a+crypto
>>> .elif defined(ASM_amd64)
>>> SRCS+= aes_core.c aesni-mb-x86_64.S aesni-sha1-x86_64.S aesni-sha256-x86_64.S
>>> SRCS+= aesni-x86_64.S vpaes-x86_64.S
>>> @@ -242,7 +242,7 @@
>>> SRCS+= ofb128.c wrap128.c xts128.c
>>> .if defined(ASM_aarch64)
>>> SRCS+= ghashv8-armx.S
>>> -ACFLAGS.ghashv8-armx.S= -march=armv8-a+crypto
>>> +ACFLAGS.ghashv8-armx.S+= -march=armv8-a+crypto
>>> .elif defined(ASM_amd64)
>>> SRCS+= aesni-gcm-x86_64.S ghash-x86_64.S
>>> .elif defined(ASM_arm)
>>>
>>> The RPi4B is using:
>>>
>>> over_voltage=6
>>> arm_freq=2000
>>>
>>> and was booted via uefi/ACPI.
>>>
>>> I have not repeated the -j4 or other -jN comparisons that
>>> I reported in the past. The -mcpu=cortex-a53 figures are
>>> from the past.
>
> The following new timing is based on head -r365932 rebuilding
> itself where the 8 GiByte RPi4B config.txt ended with:
>
> over_voltage=6
> arm_freq=2000
> sdram_freq_min=3200
>
> and the boot was via u-boot, no RAM restriction. (The
> sdram_freq_min assignment does not seem to do anything
> for rpi4-uefi-devel v1.20 uefi/ACPI based booting.)
> /etc/sysctl.conf has: dev.cpu.0.freq=2000 . No use of
> powerd or other such.
>
>
> ENVIRONMENT: -mcpu=cortex-a72 based world and kernel running already,
> 8 GiBYte RPi4B @ 2G Hz with sdram_freq_min=3200, u-boot style boot, -j3:
>
> World built in 31852 seconds, ncpu: 4, make -j3
> Kernel(s) GENERIC-NODBG built in 2059 seconds, ncpu: 4, make -j3
>
> So somewhat under 9.5 hr overall.
>
>
> That means somewhat over 3.5 hours faster than a -mcpu=cortex-a53
> based system without sdram_freq_min=3200 using 3 GiByte RAM
> but still RPi4B @ 2G Hz (uefi/ACPI boot):
>
> World built in 44034 seconds, ncpu: 4, make -j3
> Kernel(s) GENERIC-NODBG built in 2895 seconds, ncpu: 4, make -j3
>
> (Same as reported in prior messages.)
>
> But the prior -r362590 vs. the now -r363932 means there is more varying
> than in my previous comparisons. For example, clang 10 vs. clang 11.
>
> I'm probably going to run a -j4 build to see how it compares in
> this context.
ENVIRONMENT: -mcpu=cortex-a72 based world and kernel running already,
8 GiBYte RPi4B dev.cpu.0.freq=2000 with sdram_freq_min=3200, u-boot
style boot, -j4:
World built in 28526 seconds, ncpu: 4, make -j4
Kernel(s) GENERIC-NODBG built in 1841 seconds, ncpu: 4, make -j4
So somewhat under 8.5 hr overall.
That means somewhat over 4.5 hours faster than a -mcpu=cortex-a53
based system without sdram_freq_min=3200 using 3 GiByte RAM
but still RPi4B @ 2G Hz (uefi/ACPI boot).
> I've not run a default arm-freq/sdram_freq_min/dev.cpu.0.freq buildworld
> buildkernel in a long time and so do not have reasonable comparison
> figures relative to that type of context. I do not plan on such an
> experiment.
>
>
> I'll note that I run these tests with a monitor connected that sits
> with a static login prompt display after booting. I do not not test
> with X11 or other use that might significantly compete for more power.
> The serial port console is usually used. I have used ssh sometimes in
> the past.
>
> ~/src.configs/src.conf.cortexA72-clang-bootstrap.aarch64-host is still
> unchanged:
>
> # more ~/src.configs/src.conf.cortexA72-clang-bootstrap.aarch64-host
> TO_TYPE=aarch64
> #
> KERNCONF=GENERIC-NODBG
> TARGET=arm64
> .if ${.MAKE.LEVEL} == 0
> TARGET_ARCH=${TO_TYPE}
> .export TARGET_ARCH
> .endif
> #
> #WITH_CROSS_COMPILER=
> WITH_SYSTEM_COMPILER=
> WITH_SYSTEM_LINKER=
> #
> WITH_LIBCPLUSPLUS=
> #WITH_LLD_BOOTSTRAP=
> WITHOUT_BINUTILS_BOOTSTRAP=
> WITH_ELFTOOLCHAIN_BOOTSTRAP=
> #Disables avoiding bootstrap: WITHOUT_LLVM_TARGET_ALL=
> WITH_LLVM_TARGET_AARCH64=
> WITH_LLVM_TARGET_ARM=
> WITHOUT_LLVM_TARGET_MIPS=
> WITHOUT_LLVM_TARGET_POWERPC=
> WITHOUT_LLVM_TARGET_RISCV=
> WITHOUT_LLVM_TARGET_X86=
> #WITH_CLANG_BOOTSTRAP=
> WITH_CLANG=
> WITH_CLANG_IS_CC=
> WITH_CLANG_FULL=
> WITH_CLANG_EXTRAS=
> WITH_LLD=
> WITH_LLD_IS_LD=
> WITHOUT_BINUTILS=
> WITH_LLDB=
> #
> WITH_BOOT=
> WITHOUT_LIB32=
> #
> #
> NO_WERROR=
> #WERROR=
> MALLOC_PRODUCTION=
> #
> # Avoid stripping but do not control host -g status as well:
> DEBUG_FLAGS+=
> #
> WITH_REPRODUCIBLE_BUILD=
> WITH_DEBUG_FILES=
> #
> # Use of the .clang 's here avoids
> # interfering with other C<?>FLAGS
> # usage, such as ?= usage.
> CFLAGS.clang+= -mcpu=cortex-a72
> CXXFLAGS.clang+= -mcpu=cortex-a72
> CPPFLAGS.clang+= -mcpu=cortex-a72
> ACFLAGS.arm64cpuid.S+= -mcpu=cortex-a72+crypto
> ACFLAGS.aesv8-armx.S+= -mcpu=cortex-a72+crypto
> ACFLAGS.ghashv8-armx.S+= -mcpu=cortex-a72+crypto
>
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
More information about the freebsd-arm
mailing list