Re: Setting CPUFLAGS breaks aarch64 13.2 -> 14.0 cross compile due to invalid -mcpu=

From: Mark Millard <marklmi_at_yahoo.com>
Date: Sun, 19 Nov 2023 22:06:33 UTC
On Nov 19, 2023, at 10:59, John F Carr <jfc@mit.edu> wrote:

> I have been building 13.2 with the following line in /etc/make.conf:
> 
> CPUTYPE?=armv8a+aes+crc+sha2

I've not found a way through this (so far), at least using
documented inteerfacing techniques, but I did run into what
gcc13 does with -mcpu=emag : its assembler run
is given the likes of . . .

/usr/local/bin/as -EL "-march=armv8-a+crc+crypto" . . .

This corresponds to the aarch64-cores.def having:

. . .
/* Do not swap around "emag" and "xgene1",
   this order is required to handle variant correctly. */
AARCH64_CORE("emag",        emag,      xgene1,    V8A,  (CRC, CRYPTO), emag, 0x50, 0x000, 3)

/* APM ('P') cores. */
AARCH64_CORE("xgene1",      xgene1,    xgene1,    V8A,  (), xgene1, 0x50, 0x000, -1)
. . .

From what I've seen Linux classifies an example emag with:

Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer	: 0x50
CPU architecture: 8
CPU variant	: 0x3
CPU part	: 0x000
CPU revision	: 2

(OS's and compiler toolchains need not use the same terminology.)

From what I've seen, Linux classified an example xgene2 with:

Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer : 0x50
CPU architecture: 8
CPU variant     : 0x1
CPU part        : 0x000
CPU revision    : 0

(gcc has no xgene2 name)

I got these from:

https://github.com/hrw/arm-socs-table/blob/main/cpuinfo-data/emag
https://github.com/hrw/arm-socs-table/blob/main/cpuinfo-data/x-gene-2

The following notes may or may-not help in some
incomplete/less-supported way.

What I've done is more limited (text taken from an example
context not matching yours):

ACFLAGS.arm64cpuid.S+=  -mcpu=cortex-a72+crypto
ACFLAGS.aesv8-armx.S+=  -mcpu=cortex-a72+crypto
ACFLAGS.ghashv8-armx.S+=        -mcpu=cortex-a72+crypto

Sometimes instructions need to be enabled so that the
OS code that tests for functionality can produce the
instructions that might fail, in turn allowing the
detection to work. Thus, optional parts of the
architecture can need to be enabled in certain places.

I've also done things like:

# Use of the .clang 's here avoids
# interfering with other C<?>FLAGS
# usage, such as ?= usage.
CFLAGS.clang+= -mcpu=cortex-a72
CXXFLAGS.clang+= -mcpu=cortex-a72
CPPFLAGS.clang+= -mcpu=cortex-a72

that avoid the likes of +aes+crc+sha2 with .clang
use.

[I do not do these sorts of things for the ThreadRipper 1950X
or Ryzen 9 7950X3D, just for notably lower performance
contexts. cortex-a53 (strict in-order) vs. cortex-a72 (not
in-order) are rather different for optimization, despite being
the same march, for example. I even ran into a FreeBSD memory
model handling bug with my use of -mcpu=cortex-a72 compared to
a standard style build. A cortex-a53 never showed the issue.
A cortex-a72 only showed the issue with code optimized for
the out-of-order handling. So of the 2*2 possibilities (mcpu
vs. hardware) only 1 combination showed the problem. (The
USB subsystem bug that lead to the memory model mishandling
was fixed. Such testing contributes to why I do such things
for arm.)]

Somewhat closer to your type of context is an example
where llvm misclassifies the features of a named cpu
and I adjust things to be more accurate:

# Any use of the .clang 's here (e.g.) would
# avoid interfering with other C<?>FLAGS
# usage, such as ?= usage. .aarch64 and .armv7
# do more, staying consistent with not
# lib32 vs. lib32 context.
CFLAGS.aarch64+= -mcpu=cortex-x1c+flagm+lse+rcpc
CXXFLAGS.aarch64+= -mcpu=cortex-x1c+flagm+lse+rcpc
CPPFLAGS.aarch64+= -mcpu=cortex-x1c+flagm+lse+rcpc
CFLAGS.armv7+= -mcpu=cortex-a7
CXXFLAGS.armv7+= -mcpu=cortex-a7
CPPFLAGS.armv7+= -mcpu=cortex-a7
LIB32CPUTYPE=cortex-a7
ACFLAGS.arm64cpuid.S+=  -mcpu=cortex-x1c
ACFLAGS.aesv8-armx.S+=  -mcpu=cortex-x1c
ACFLAGS.ghashv8-armx.S+=        -mcpu=cortex-x1c

Note the lack of use of .clang for the above.

This is clearly not using the primary control
interface documented for the build system but
is how I deal with my tailored builds.

> This matches my processor (Ampere eMAG), which llvm does not
> know by name.
> 
> Now I want to upgrade to 14.0.  I can't build from source on 13.2.
> Compiling 32 bit objects fails because $CPUTYPE is not valid
> for armv7.  Setting CPUTYPE_32?=armv7 does not work either.
> That generates an invalid compiler option -mcpu=armv7.
> Setting CPUTYPE=armv7 needs to generate only -march=armv7
> and not -mcpu=armv7.  The make infrastructure generates both.
> 
> Using an empty string for CPUTYPE_32 did not work either.
> 
> According to /usr/share/examples/etc/make.conf, I should be
> able to use CPUTYPE=armv7.
> 
> Is this supposed to work?  Is there a /etc/make.conf variable that
> sets -march= but not -mcpu=?
> 
> 
> # Meta data file /usr/obj/usr/src/arm64.aarch64/libexec/rtld-elf32/crtbrand.o.meta
> CMD cc -target aarch64-unknown-freebsd14.0 --sysroot=/usr/obj/usr/src/arm64.aarch64/tmp -B/usr/obj/usr/src/arm64.aarch64/tmp/usr/bin -O2 -pipe -fno-common -march=armv8a+aes+crc+sha2  -mcpu=armv8a+aes+crc+sha2 -m32 -target armv7-unknown-freebsd14.0-gnueabihf  -DCOMPAT_LIBCOMPAT=\"32\"  -DCOMPAT_libcompat=\"32\"  -DCOMPAT_LIB32  --sysroot=/usr/obj/usr/src/arm64.aarch64/tmp   -B/usr/obj/usr/src/arm64.aarch64/tmp/usr/lib32 -Wall -DFREEBSD_ELF -DIN_RTLD -ffreestanding -I/usr/src/lib/csu/common -I/usr/src/libexec/rtld-elf/arm -I/usr/src/libexec/rtld-elf -fpic -DPIC  -I/usr/src/libexec/rtld-elf/rtld-libc -mfpu=none -g -gz=zlib -std=gnu99 -Wno-format-zero-length -Wsystem-headers -Werror -Wall -Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wreturn-type -Wcast-qual -Wwrite-strings -Wswitch -Wshadow -Wunused-parameter -Wchar-subscripts -Wnested-externs -Wold-style-definition -Wno-pointer-sign -Wdate-time -Wformat=2 -Wno-format-extra-args -Werror -Wmissing-variable-declarations -Wthread-safety -Wno-empty-body -Wno-string-plus-int -Wno-unused-const-variable -Wno-error=unused-but-set-parameter  -Qunused-arguments     -DLOCORE   -c /usr/src/lib/csu/common/crtbrand.S -o crtbrand.o
> CMD  CWD /usr/obj/usr/src/arm64.aarch64/libexec/rtld-elf32
> TARGET crtbrand.o
> OODATE /usr/src/lib/csu/common/crtbrand.S
> -- command output --
> clang: error: unsupported argument 'armv8a+aes+crc+sha2' to option '-mcpu='
> clang: error: ignoring extension 'sha2' because the 'invalid' architecture does not support it [-Werror,-Winvalid-command-line-argument]
> clang: error: ignoring extension 'aes' because the 'invalid' architecture does not support it [-Werror,-Winvalid-command-line-argument]
> clang: error: unsupported argument 'armv8a+aes+crc+sha2' to option '-mcpu='
> clang: error: ignoring extension 'sha2' because the 'invalid' architecture does not support it [-Werror,-Winvalid-command-line-argument]
> clang: error: ignoring extension 'aes' because the 'invalid' architecture does not support it [-Werror,-Winvalid-command-line-argument]
> 


===
Mark Millard
marklmi at yahoo.com