Re: Ryzen 9 7950X3D bulk -a times: adding an example with SMT disabled (so 16 hardware threads, not 32)
- Reply: Mark Millard : "Re: Ryzen 9 7950X3D bulk -a times: adding an example with SMT disabled (so 16 hardware threads, not 32)"
- In reply to: Mark Millard : "Ryzen 9 7950X3D bulk -a times: adding an example with SMT disabled (so 16 hardware threads, not 32)"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 13 Nov 2023 02:00:46 UTC
On Nov 9, 2023, at 17:26, Mark Millard <marklmi@yahoo.com> wrote: > Reading some benchmark results for compilation activity that showed some > SMT vs. not examples and also using my C++ variant of the old HINT > benchmark, I ended up curious how a non-SMT from scratch bulk -a would > end up (ZFS context) compared my prior SMT based run. > > I use a high load average style of bulk -a activity that has USE_TMPFS=all > involved. The system has 96 GiBytes of RAM (total across the 2 DIMMs). > The original under 1.5 day time definitely had significant swap space use > (RAM+SWAP = 96 GiBYtes + 364 GiBytes == 460 GiBytes == 471040 MiBytes). > The media was (and is) a PCIe based Optane 905P 1.5T. ZFS on a single > partition on the single drive, ZFS used just for bectl reasons, not other > typical use-ZFS reasons. I've not controlled the ARC size-range explicitly. > > So less swap partition use is part of contribution to the results. > > The original bulk -a spent a couple of hours at the end where it was > just fetching and building textproc/stardict-quick . I have not cleared > out /usr/ports/distfiles or updated anything. > > So fetch time is also a difference here. > > SMT (32 hardware threads, original bulk -a): > > [33:10:00] [32] [04:37:23] Finished emulators/libretro-mame | libretro-mame-20220124_1: Success > [35:36:51] [23] [03:44:04] Finished textproc/stardict-quick | stardict-quick-2.4.2_9: Success > . . . > [main-amd64-bulk_a-default] [2023-11-01_07h14m50s] [committing:] Queued: 34683 Built: 33826 Failed: 179 Skipped: 358 Ignored: 320 Fetched: 0 Tobuild: 0 Time: 35:37:55 > > Swap-involved MaxObs (Max Observed) figures: > 173310Mi MaxObsUsed > 256332Mi MaxObs(Act+Lndry+SwapUsed) > 265551Mi MaxObs(Act+Wir+Lndry+SwapUsed) > (So 265551Mi of 471040Mi RAM+SWAP.) > > Just-RAM MaxObs figures: > 81066Mi MaxObsActive > (Given the complications of getting usefully comparable wired figures for ZFS (ARC): omit.) > 94493Mi MaxObs(Act+Wir+Lndry) > > Note: MaxObs(A+B+C) <= MaxObs(A)+MaxObs(B)+MaxObs(C) > > ALLOW_MAKE_JOBS=yes was used. No explicit restriction on PARALLEL_JOBS > or MAKE_JOBS_NUMBER (or analogous). So 32 builders allowed, each allowed > 32 make jobs. This explains the high load averages of the bulk -a : > > load averages . . . MaxObs: 360.70, 267.63, 210.84 > (Those need not be all from the same time frame during the bulk -a .) > > As for the ports vintage: > > # ~/fbsd-based-on-what-commit.sh -C /usr/ports/ > 6ec8e3450b29 (HEAD -> main, freebsd/main, freebsd/HEAD) devel/sdts++: Mark DEPRECATED > Author: Muhammad Moinur Rahman <bofh@FreeBSD.org> > Commit: Muhammad Moinur Rahman <bofh@FreeBSD.org> > CommitDate: 2023-10-21 19:01:38 +0000 > branch: main > merge-base: 6ec8e3450b29462a590d09fb0b07ed214d456bd5 > merge-base: CommitDate: 2023-10-21 19:01:38 +0000 > n637598 (--first-parent --count for merge-base) > > I do have a environment that avoids various LLVM builds taking > as long to build : > > llvm1[3-7] : no MLIR, no FLANG > llvm1[4-7] : use BE_NATIVE > other llvm* : use defaults (so, no avoidance) > > I also prevent the builds from using strip on most of the install > materials built (not just toolchain materials). > > > non-SMT (16 hardware threads): > > Note one builder (math/fricas), the last still present, was > stuck and I had to kill processes to have it stop unless I > was willing to wiat for my large timeout figures. The last > builder normal-finish was: > > [39:48:10] [09] [00:16:23] Finished devel/gcc-msp430-ti-toolchain | gcc-msp430-ti-toolchain-9.3.1.2.20210722_1: Success > > So, trying to place some bounds for comparing to SMT (32 hw threads) > and non-SMT (16 hw threads): > > 33:10:00 SMT -> 39:48:10 non-SMT would be over 6.5 hrs longer for non-SMT > 35:36:51 SMT -> 39:48:10 non-SMT would be over 4 hrs longer for non-SMT > > As for SMT vs. non-SMT Maximum Observed figures: > > SMT load averages . . . MaxObs: 360.70, 267.63, 210.84 > non-SMT load averages . . . MaxObs: 152.89, 100.94, 76.28 > > Swap-involved MaxObs figures for SMT (32 hw threads) vs not (16): > 173310Mi vs. 33003Mi MaxObsUsed > 256332Mi vs. 117221Mi MaxObs(Act+Lndry+SwapUsed) > 265551Mi vs. 124776Mi MaxObs(Act+Wir+Lndry+SwapUsed) > > Just-RAM MaxObs figures for SMT (32 hw threads) vs not (16): > 81066Mi vs. 69763Mi MaxObsActive > (Given the complications of getting usefully comparable wired figures for ZFS (ARC): omit.) > 94493Mi vs. 94303Mi MaxObs(Act+Wir+Lndry) > I've added a section for a plot for the 7950X3D to the end of: https://github.com/markmi/acpphint/blob/master/Some_acpphint_curves_with_notes.md It is from a C++ variant of the old HINT benchmark and includes showing RAM caching consequences for the benchmark. The about 32 MiByte and about 96 MiByte cache sizes for the 2 CCDs are observable. I'll also note that for the devices present (active and not), at fully active the 7950X3D seems to use 225 Watts .. 235 Watts at the power cable for FreeBSD. Idle FreeBSD: more like 96 Watts. (No video card. 2 forms of Optane 905P 1.5TB, one active. One Samsung 960 Pro 2TB, inactive. One Samsung 970 EVO Plus 2TB, inactive. 96 GiBytes of RAM total across 2 DIMMs. Fans and AIO cooling. Keyboard and mouse USB powered. USB3 Ethernet dongle. Monitor connection.) ThreadRipper 1950X "bulk -a" test in progress: I'm running a from-scratch USE_TMPFS=all "bulk -a" on the ThreadRipper 1950X (128 GiBytes of RAM). From what I've seen so far, it looks to likely take over 72 hr, so 2x+ as long as the 7950X3D. (Samgsung 960 Pro 1TB system media and Optane 900 480 GB swap space media in use, 447 GiByte I as I remember). The ZFS partition on the 960 Pro has ashift=14 .) It has a slightly modified copy of the ZFS from the 7950X3D as far as starting content goes. It does have openzfs-2.2 compatibility fully enabled for its pool, including block cloning, unlike any other ZFS I have around (openzfs-2.1-freebsd). === Mark Millard marklmi at yahoo.com