performance regressions in 15.0
Date: Sat, 06 Dec 2025 10:50:08 UTC
I got pointed at phoronix: https://www.phoronix.com/review/freebsd-15-amd-epyc
While I don't treat their results as gospel, a FreeBSD vs FreeBSD test
showing a slowdown most definitely warrants a closer look.
They observed slowdowns when using iperf over localhost and when compiling llvm.
I can confirm both problems and more.
I found the profiling tooling for userspace to be broken again so I
did not investigate much and I'm not going to dig into it further.
Test box is AMD EPYC 9454 48-Core Processor, with the 2 systems
running as 8 core vms under kvm.
I. iperf
Package is: iperf3-3.19.1
Tested with: iperf3 -s + iperf3 -c localhost
While the rates fluctuate, 14.3 is overall faster:
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.01 sec 2.70 GBytes 23.1 Gbits/sec
[ 5] 1.01-2.07 sec 1.92 GBytes 15.5 Gbits/sec
[ 5] 2.07-3.01 sec 1.76 GBytes 16.1 Gbits/sec
[ 5] 3.01-4.02 sec 1.86 GBytes 15.9 Gbits/sec
[ 5] 4.02-5.01 sec 2.84 GBytes 24.5 Gbits/sec
[ 5] 5.01-6.02 sec 2.54 GBytes 21.7 Gbits/sec
[ 5] 6.02-7.07 sec 2.18 GBytes 17.8 Gbits/sec
[ 5] 7.07-8.02 sec 1.76 GBytes 15.9 Gbits/sec
[ 5] 8.02-9.01 sec 1.88 GBytes 16.3 Gbits/sec
[ 5] 9.01-10.02 sec 1.90 GBytes 16.2 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.02 sec 21.3 GBytes 18.3 Gbits/sec receiver
vs 15.0:
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.01 sec 1.85 GBytes 15.7 Gbits/sec
[ 5] 1.01-2.02 sec 3.23 GBytes 27.5 Gbits/sec
[ 5] 2.02-3.03 sec 1.84 GBytes 15.7 Gbits/sec
[ 5] 3.03-4.01 sec 1.86 GBytes 16.3 Gbits/sec
[ 5] 4.01-5.01 sec 1.64 GBytes 14.1 Gbits/sec
[ 5] 5.01-6.07 sec 1.87 GBytes 15.1 Gbits/sec
[ 5] 6.07-7.01 sec 1.23 GBytes 11.3 Gbits/sec
[ 5] 7.01-8.01 sec 1.85 GBytes 15.8 Gbits/sec
[ 5] 8.01-9.01 sec 1.42 GBytes 12.2 Gbits/sec
[ 5] 9.01-10.01 sec 1.81 GBytes 15.5 Gbits/sec
[ 5] 10.01-10.07 sec 99.9 MBytes 14.1 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.07 sec 18.7 GBytes 16.0 Gbits/sec receiver
This is reliably repeatable.
II. compilation speed
The the real and serious problem. Both versions of the system ship the
same clang version:
FreeBSD clang version 19.1.7 (https://github.com/llvm/llvm-project.git
llvmorg-19.1.7-0-gcd708029e0b2)
Target: x86_64-unknown-freebsd14.3
Thread model: posix
InstalledDir: /usr/bin
FreeBSD clang version 19.1.7 (https://github.com/llvm/llvm-project.git
llvmorg-19.1.7-0-gcd708029e0b2)
Target: x86_64-unknown-freebsd15.0
Thread model: posix
InstalledDir: /usr/bin
I found that compiling the will-it-scale suite about doubles in real
time needed, along with doubling time spent in userspace.
will-it-scale needs a little bit of massaging to work, diff at the end.
check this out (repeabale): while true; do gmake -s clean && time
gmake -s -j 8; done
14.3:
gmake -s -j 8 8.93s user 2.03s system 769% cpu 1.42s (1.424) total
gmake -s -j 8 9.02s user 2.16s system 757% cpu 1.48s (1.475) total
gmake -s -j 8 9.29s user 1.95s system 774% cpu 1.45s (1.450) total
gmake -s -j 8 8.97s user 2.46s system 770% cpu 1.48s (1.484) total
gmake -s -j 8 9.13s user 2.30s system 773% cpu 1.48s (1.477) total
15.0:
gmake -s -j 8 19.90s user 3.02s system 773% cpu 2.96s (2.963) total
gmake -s -j 8 19.90s user 3.18s system 774% cpu 2.98s (2.979) total
gmake -s -j 8 20.24s user 2.90s system 770% cpu 3.00s (3.005) total
gmake -s -j 8 19.92s user 3.25s system 771% cpu 3.00s (3.003) total
gmake -s -j 8 20.25s user 2.95s system 772% cpu 3.01s (3.006) total
user time *skyrocketed*
This is not some weird scheduling anomaly either: while true; do gmake
-s clean && time cpuset -l 1 gmake -s ; done
14.3:
cpuset -l 1 gmake -s 8.88s user 1.11s system 99% cpu 10.00s (10.003) total
cpuset -l 1 gmake -s 8.94s user 1.12s system 99% cpu 10.07s (10.067) total
cpuset -l 1 gmake -s 9.00s user 1.06s system 99% cpu 10.07s (10.072) total
cpuset -l 1 gmake -s 8.88s user 1.17s system 99% cpu 10.07s (10.069) total
cpuset -l 1 gmake -s 8.88s user 1.23s system 99% cpu 10.13s (10.127) total
15.0:
cpuset -l 1 gmake -s 21.58s user 2.33s system 99% cpu 23.96s (23.961) total
cpuset -l 1 gmake -s 21.16s user 2.54s system 99% cpu 23.76s (23.759) total
cpuset -l 1 gmake -s 19.90s user 1.90s system 99% cpu 21.85s (21.854) total
cpuset -l 1 gmake -s 19.76s user 1.74s system 99% cpu 21.55s (21.554) total
cpuset -l 1 gmake -s 19.72s user 1.75s system 99% cpu 21.53s (21.526) total
Per my previous remark I found userspace profiling to be
non-operational and I did not try to fight it.
It did however do few sanity checks mostly with will-its-scale:
1. syscall rate is down over 7% (tested with getppid1_processes)
2. malloc also got a slowdown(!). there are 2 benches, one ends up
issuing syscalls, the other does not.
Results in ops/s:
malloc1_processes (malloc/free of 128MB):
14.3: 1960769
15.0: 1376087 (-30%)
malloc2_processes (malloc/free of 1kB):
14.3: 156034491
15.0: 51645759 (-67%)
Apart from that the kernel is overall slower, for example negative
path lookups also regressed (-12%).
Another issue is execve rate. To bench that I borrowed the following:
http://apollo.backplane.com/DFlyMisc/doexec.c
cc -O2 doexec.c
cpuset -l 1 ./a.out 1
In ops/s:
14.3: 4905
15.0: 3672 (-25%)
The clang thing might happen to be clang-specific. Whatever it is, I
think the total slowdown is serious enough that it warrants
investigation and an errata notice. But you do you, I am *not* going
to work on this.
will-it-scale howto:
pkg install gmake hwloc
git clone https://github.com/antonblanchard/will-it-scale
add this:
diff --git a/Makefile b/Makefile
index 8dd0717..d779705 100644
--- a/Makefile
+++ b/Makefile
@@ -1,9 +1,11 @@
-CFLAGS+=-Wall -O2 -g
-LDFLAGS+=-lhwloc
+CFLAGS+=-Wall -O2 -g -I/usr/local/include
+LDFLAGS+=-lhwloc -L/usr/local/lib
processes := $(patsubst tests/%.c,%_processes,$(wildcard tests/*.c))
threads := $(patsubst tests/%.c,%_threads,$(wildcard tests/*.c))
+threadspawn1_processes_FLAGS+=-lpthread
+
all: processes threads
processes: $(processes)
diff --git a/tests/malloc1.c b/tests/malloc1.c
index 14d4c3b..05737bb 100644
--- a/tests/malloc1.c
+++ b/tests/malloc1.c
@@ -12,6 +12,7 @@ void testcase(unsigned long long *iterations,
unsigned long nr)
while (1) {
void *addr = malloc(SIZE);
assert(addr != NULL);
+ asm volatile("" :: "m" (addr));
free(addr);
(*iterations)++;
diff --git a/tests/malloc2.c b/tests/malloc2.c
index c24aceb..e769dd3 100644
--- a/tests/malloc2.c
+++ b/tests/malloc2.c
@@ -12,6 +12,7 @@ void testcase(unsigned long long *iterations,
unsigned long nr)
while (1) {
void *addr = malloc(SIZE);
assert(addr != NULL);
+ asm volatile("" :: "m" (addr));
free(addr);
(*iterations)++;