poudriere-devel for UFS vs. ZFS: time for "Starting/Cloning builders", aarch64 example (HoneyComb with 16 Cortex-A72's)

From: Mark Millard via freebsd-toolchain <freebsd-toolchain_at_freebsd.org>
Date: Thu, 09 Sep 2021 23:35:11 UTC
The following is from the same system but different boot media selected in
EDK2's UEFI/ACPI. A summary for a 16-core HoneyComb Cortex-A72 system is
UFS: 80sec+ ZFS: 10sec or so. I am limited to at most 16 cores for aarch64.

HoneyComb (16 cpu) UFS (so: NO_ZFS=yes) with USE_TMPFS="data" (main system) :

[00:00:28] Building 476 packages using 16 builders
[00:00:28] Starting/Cloning builders
[00:01:49] Hit CTRL+t at any time to see build progress and stats
[00:01:49] [01] [00:00:00] Building ports-mgmt/pkg | pkg-1.17.1

HoneyComb (16 cpu) ZFS with USE_TMPFS="data" (main system) :

[00:00:13] Building 475 packages using 16 builders
[00:00:13] Starting/Cloning builders
[00:00:23] Hit CTRL+t at any time to see build progress and stats
[00:00:23] [01] [00:00:00] Building ports-mgmt/pkg | pkg-1.17.1

Both of these were after recently booting.

Past experience is that UFS gets more than proportionally worse
as the freeBSD cpu count goes up. But I've only had 4 and 16
for aarch64 and until recently had no ZFS context. I only had
4 (2 sockets/2 cores each) back in my old PowerMac "Quad Core"
usage days but it was demonstrable back then as well. (The
Quad Core PowerMacs bit the dust.)

Past investigations suggested that th parallel cpdup's were
spending much time spinning but not makeing progress, ending up
showing getblk status during the spinning. I have at times
modified the common.sh script to have the:

       # Jail might be lingering from previous build. Already recursively
       # destroyed all the builder datasets, so just try stopping the jail
       # and ignore any errors
       stop_builder "${id}"
       mkdir -p "${mnt}"
       clonefs ${MASTERMNT} ${mnt} prepkg

code instead executed in a sequential loop just before the
"parallel_start" . This helped cut the time for a UFS context
by avoiding (busy) wait time.

For reference (the ZFS example matches but for node names:

# uname -apKU
FreeBSD CA72_UFS 14.0-CURRENT FreeBSD 14.0-CURRENT #13 main-n249019-0637070b5bca-dirty: Sat Sep  4 18:12:39 PDT 2021     root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72  arm64 aarch64 1400032 1400032

# cd /usr/ports
# ~/fbsd-based-on-what-commit.sh 
branch: main
merge-base: b0c4eaac2a3aa9bc422c21b9d398e4dbfea18736
merge-base: CommitDate: 2021-09-07 21:55:24 +0000
b0c4eaac2a3a (HEAD -> main, freebsd/main, freebsd/HEAD) security/suricata: Add patch for upstream locking fix
n557269 (--first-parent --count for merge-base)

But the issue is not new.

The effect is visible on the ThreadRipper 1950X (32 FreeBSD cpus):
(Again, one system but different boot media selected, also the
same main and ports vintages.)

ThreadRipper 1950X (32 cpu) UFS with USE_TMPFS=yes (releng/13.0) :

[00:00:11] Building 111 packages using 32 builders
[00:00:11] Starting/Cloning builders
[00:00:51] Hit CTRL+t at any time to see build progress and stats
[00:00:51] [01] [00:00:00] Building devel/meson | meson-0.59.1

ThreadRipper 1950X (32 cpu) ZFS with USE_TMPFS=yes (main) :

[00:00:07] Building 535 packages using 32 builders
[00:00:07] Starting/Cloning builders
[00:00:18] Hit CTRL+t at any time to see build progress and stats
[00:00:18] [01] [00:00:00] Building ports-mgmt/pkg | pkg-1.17.1

So UFS about 40 sec vs. ZFS about 11 sec.

(I never did the contrasting ZFS case for old PowerMacs and
no longer have access to such.)

I'm guessing that the UFS context would not scale well to having 2
or more times as many freebsd cpus. (I do not have access to such.)

I expect that other times when other parallel cpdup activity by
poudriere is involved for UFS also ends up with the "getblk
status" issue in some way.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)