Re: performance regressions in 15.0 [The Microsoft Dev Kit 2023 buildworld took about 6 minutes less time for jemalloc 5.3.0, not more, for non-debug contexts]
- Reply: Mateusz Guzik : "Re: performance regressions in 15.0 [The Microsoft Dev Kit 2023 buildworld took about 6 minutes less time for jemalloc 5.3.0, not more, for non-debug contexts]"
- In reply to: Mark Millard : "Re: performance regressions in 15.0"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sun, 07 Dec 2025 16:18:56 UTC
On Dec 6, 2025, at 19:03, Mark Millard <marklmi@yahoo.com> wrote: > On Dec 6, 2025, at 14:25, Warner Losh <imp@bsdimp.com> wrote: > >> On Sat, Dec 6, 2025, 3:06 PM Mark Millard <marklmi@yahoo.com> wrote: >> >>> On Dec 6, 2025, at 06:14, Mark Millard <marklmi@yahoo.com> wrote: >>> >>>> Mateusz Guzik <mjguzik_at_gmail.com> wrote on >>>> Date: Sat, 06 Dec 2025 10:50:08 UTC : >>>> >>>>> I got pointed at phoronix: https://www.phoronix.com/review/freebsd-15-amd-epyc >>>>> >>>>> While I don't treat their results as gospel, a FreeBSD vs FreeBSD test >>>>> showing a slowdown most definitely warrants a closer look. >>>>> >>>>> They observed slowdowns when using iperf over localhost and when compiling llvm. >>>>> >>>>> I can confirm both problems and more. >>>>> >>>>> I found the profiling tooling for userspace to be broken again so I >>>>> did not investigate much and I'm not going to dig into it further. >>>>> >>>>> Test box is AMD EPYC 9454 48-Core Processor, with the 2 systems >>>>> running as 8 core vms under kvm. >>>>> . . . >>>> >>>> >>>> >>>> Both of the below are from ampere3 (aarch64) instead, its >>>> 2 most recent "bulk -a" runs that completed, elapsed times >>>> shown for qt6-webengine-6.9.3 builds: >>>> >>>> 150releng-arm64-quarterly qt6-webengine-6.9.3 53:33:46 >>>> 135arm64-default qt6-webengine-6.9.3 38:43:36 >>>> >>>> For reference: >>>> >>>> Host OSVERSION: 1600000 >>>> Jail OSVERSION: 1500068 >>>> >>>> vs. >>>> >>>> Host OSVERSION: 1600000 >>>> Jail OSVERSION: 1305000 >>>> >>>> The difference for the above is in the Jail's world builds, >>>> not in the boot's (kernel+world) builds. >>>> >>>> >>>> For reference: >>>> >>>> >>>> https://pkg-status.freebsd.org/ampere3/build.html?mastername=150releng-arm64-quarterly&build=88084f9163ae >>>> >>>> build of www/qt6-webengine | qt6-webengine-6.9.3 ended at Sun Nov 30 05:40:02 -00 2025 >>>> build time: 2D:05:33:52 >>>> >>>> >>>> https://pkg-status.freebsd.org/ampere3/build.html?mastername=135arm64-default&build=f5384fe59be6 >>>> >>>> build of www/qt6-webengine | qt6-webengine-6.9.3 ended at Sat Nov 22 15:33:34 -00 2025 >>>> build time: 1D:14:43:41 >>> >>> >>> Expanding the notes to before and after jemalloc 5.3.0 >>> was merged to main: beefy18 was the main-amd64 builder >>> before and somewhat after the jemalloc 5.3.0 merge from >>> vendor branch: >>> >>> Before: p2650762431ca_s51affb7e971 261:29:13 building 36074 port-packages, start 05 Aug 2025 01:10:59 GMT >>> ( jemalloc 5.3.0 merge from vendor branch: 15 Aug 2025) >>> After : p9652f95ce8e4_sb45a181a74c 428:49:20 building 36318 port-packages, start 19 Aug 2025 01:30:33 GMT >>> >>> (The log files are long gone for port-packages built.) >>> >>> main-15 used a debug jail world but 15.0-RELEASE does not. >>> >>> I'm not aware of such a port-package builder context for a >>> non-debug jail world before and after a jemalloc 5.3.0 merge. >>> >> A few months before I landed the jemalloc patches, i did 4 or 5 from dirt buildworlds. The elasped time was, iirc, with 1 or 2%. Enough to see maybe a diff with the small sample size, but not enough for ministat to trigger at 95%. I didn't recall keeping the data for this and can't find it now. And I'm not even sure, in hindsight, I ran a good experiment. It might be related, or not, but it would be easy enough for someone to setup a two jails: one just before and one just after. Build from scratch the world (same hash) on both. That would test it since you'd be holding all other variables constant. >> >> When we imported the tip of FreeBSD main at work, we didn't get a cpu change trigger from our tests that I recall... > > > The range of commits look like: > > • git: 9a7c512a6149 - main - ucred groups: restore a useful comment Eric van Gyzen > • git: bf6039f09a30 - main - jemalloc: Unthin contrib/jemalloc Warner Losh > • git: a0dfba697132 - main - jemalloc: Update jemalloc.xml.in per FreeBSD-diffs Warner Losh > • git: 718b13ba6c5d - main - jemalloc: Add FreeBSD's updates to jemalloc_preamble.h.in Warner Losh > • git: 6371645df7b0 - main - jemalloc: Add JEMALLOC_PRIVATE_NAMESPACE for the libc namespace Warner Losh > • git: da260ab23f26 - main - jemalloc: Only replace _pthread_mutex_init_calloc_cb in private namespace Warner Losh > • git: c43cad871720 - main - jemalloc: Merge from jemalloc 5.3.0 vendor branch Warner Losh > • git: 69af14a57c9e - main - jemalloc: Note update in UPDATING and RELNOTES Warner Losh > > I've started a build of a non-debug 9a7c512a6149 world > to later create a chroot to do a test buildworld in. > > I'll also do a build of a non-debug 69af14a57c9e world > to later create the other chroot to do a test > buildworld in. > > non-debug means my use of: > > WITH_MALLOC_PRODUCTION= > WITHOUT_ASSERT_DEBUG= > WITHOUT_PTHREADS_ASSERTIONS= > WITHOUT_LLVM_ASSERTIONS= > > I've used "env WITH_META_MODE=" as it cuts down on the > volume and frequency of scrolling output. I'll do the > same later. > > If there is anything you want controlled in a different > way, let me know. > > The Windows Dev Kit 2023 is booted (world and kernel) > with: > > # uname -apKU > FreeBSD aarch64-main-pbase 16.0-CURRENT FreeBSD 16.0-CURRENT main-n281922-4872b48b175c GENERIC-NODEBUG arm64 aarch64 1600004 1600004 > > which is from an official pkgbase distribution. So the > boot-world is a debug world but the boot-kernel is not. > > The Windows Dev Kit 2023 will take some time for such > -j8 builds and I may end up sleeping in the middle of > the sequence someplace. So it may be a while before > I've any comparison/contrast data to report. > Summary for jemalloc for before vs. at 5.3.0 for *non-debug* contexts doing the buildworld : before 5.3.0: 9754 seconds (about 2.7 hrs) with 5.3.0: 9384 seconds (about 2.6 hrs) So: somewhat less time with 5.3.0 but nearly the same. It does not clarify what is going on for building qt6-webengine-6.9.3 --other than suggesting including looking for alternative sources of issues. Also, it seems that the Mateusz Guzik microbenchmark results do not scale for the specific type of activity for the specific type of platform. Details . . . My two source trees for creating the 2 chroots are: # ~/fbsd-based-on-what-commit.sh -C /usr/src-jemalloc-5p3p0-before/ 9a7c512a6149 (HEAD) ucred groups: restore a useful comment Author: Eric van Gyzen <vangyzen@FreeBSD.org> Commit: Eric van Gyzen <vangyzen@FreeBSD.org> CommitDate: 2025-08-15 13:29:18 +0000 # ~/fbsd-based-on-what-commit.sh -C /usr/src-jemalloc-5p3p0-at/ 69af14a57c9e (HEAD) jemalloc: Note update in UPDATING and RELNOTES Author: Warner Losh <imp@FreeBSD.org> Commit: Warner Losh <imp@FreeBSD.org> CommitDate: 2025-08-15 21:57:59 +0000 Both have src.conf : WITH_MALLOC_PRODUCTION= WITHOUT_ASSERT_DEBUG= WITHOUT_PTHREADS_ASSERTIONS= WITHOUT_LLVM_ASSERTIONS= since that works for the main 16 in use. (But /etc/src.conf needs to be used in the chroot's.) Having main 16 build /usr/src-jemalloc-5p3p0-before/ : World build completed on Sat Dec 6 21:24:09 PST 2025 World built in 11817 seconds, ncpu: 8, make -j8 Having main 16 build /usr/src-jemalloc-5p3p0-at/ : World build completed on Sun Dec 7 00:46:25 PST 2025 World built in 11996 seconds, ncpu: 8, make -j8 (So: not much difference, as expected.) I then did installation and setup of the two chroot directory trees, creating: # ls -dC1 /usr/obj/DESTDIRs/jemalloc-5p3p0-*/ /usr/obj/DESTDIRs/jemalloc-5p3p0-at/ /usr/obj/DESTDIRs/jemalloc-5p3p0-before/ Both got /etc/src.conf : WITH_MALLOC_PRODUCTION= WITHOUT_ASSERT_DEBUG= WITHOUT_PTHREADS_ASSERTIONS= WITHOUT_LLVM_ASSERTIONS= I then created and, via rsync, populated each of: # ls -dC1 /usr/obj/DESTDIRs/jemalloc-5p3p0-*/usr/src-jemalloc-5p3p0-*/ /usr/obj/DESTDIRs/jemalloc-5p3p0-at/usr/src-jemalloc-5p3p0-at/ /usr/obj/DESTDIRs/jemalloc-5p3p0-before/usr/src-jemalloc-5p3p0-at/ I then did: # chroot /usr/obj/DESTDIRs/jemalloc-5p3p0-before/ # cd /usr/src-jemalloc-5p3p0-at/ # env WITH_META_MODE make -j8 buildworld It resulted in: World build completed on Sun Dec 7 12:25:45 UTC 2025 World built in 9754 seconds, ncpu: 8, make -j8 (So definitely less time consuming than main 16's build of the src-jemalloc-5p3p0-at/ source, as expected.) After exiting that chroot, I then did: # chroot /usr/obj/DESTDIRs/jemalloc-5p3p0-at/ # cd /usr/src-jemalloc-5p3p0-at/ # env WITH_META_MODE make -j8 buildworld It resulted in: World build completed on Sun Dec 7 15:36:41 UTC 2025 World built in 9384 seconds, ncpu: 8, make -j8 So, less time than before jemalloc 5.3.0 . Note: the Microsoft Windows Dev Kit 2023 was using a 1.4 TByte Optane U.2 via a USB3 adapter, of all things. === Mark Millard marklmi at yahoo.com