Possible regression in main causing poor performance

From: Glen Barber <gjb_at_freebsd.org>
Date: Sat, 19 Aug 2023 00:10:59 UTC
I am somewhat inclined to look in the direction of ZFS here, as two
things changed:

1) the build machine in question was recently (as in a week and a half
   ago) upgraded to the tip of main in order to ease the transition from
   this machine from building 14.x to building 15.x;
2) there is the recent addition of building ZFS-backed virtual machine
   and cloud images.

Here is a bit of the history of the build machines:

- the second and first build 13.x and 14.x, respectively.  The third is
  (at the time it was purchased) newer than the other two, and
  significantly fast when it comes to weekly snapshots.  I'll refer to
  these machines as such: first, second, and third.

- The third machine will be eventually used to build 14.0-RELEASE.  The
  will first continue on main, where it will build 15.0-CURRENT
  snapshots.  The second will continue tracking 13-STABLE.

The first machine runs:
  # uname -a
  FreeBSD releng1.nyi.freebsd.org 14.0-CURRENT FreeBSD 14.0-CURRENT \
    amd64 1400093 #5 main-n264224-c84617e87a70: Wed Jul 19 19:10:38 UTC 2023

Last week's snapshot builds were completed in a reasonable amount of
time:

  root@releng1.nyi:/releng/scripts-snapshot/scripts # ./thermite.sh -c ./builds-14.conf ; echo ^G
  20230811-00:03:11       INFO:   Creating /releng/scripts-snapshot/logs
  20230811-00:03:11       INFO:   Creating /releng/scripts-snapshot/chroots
  20230811-00:03:12       INFO:   Creating /releng/scripts-snapshot/release
  20230811-00:03:12       INFO:   Creating /releng/scripts-snapshot/ports
  20230811-00:03:12       INFO:   Creating /releng/scripts-snapshot/doc
  20230811-00:03:13       INFO:   Checking out https://git.FreeBSD.org//src.git (main) to /releng/scripts-snapshot/release
  [...]
  20230811-15:11:13       INFO:   Staging for ftp: 14-i386-GENERIC-snap
  20230811-16:27:28       INFO:   Staging for ftp: 14-amd64-GENERIC-snap
  20230811-16:33:43       INFO:   Staging for ftp: 14-aarch64-GENERIC-snap

Overall, 17 hours, including the time to upload EC2, Vagrant, and GCE.

With no changes to the system, no stale ZFS datasets laying around from
last week (everything is a pristine environment, etc.), this week's
builds are taking forever:

  root@releng1.nyi:/releng/scripts-snapshot/scripts # ./thermite.sh -c ./builds-14.conf ; echo ^G
  20230818-00:15:44       INFO:   Creating /releng/scripts-snapshot/logs
  20230818-00:15:44       INFO:   Creating /releng/scripts-snapshot/chroots
  20230818-00:15:45       INFO:   Creating /releng/scripts-snapshot/release
  20230818-00:15:45       INFO:   Creating /releng/scripts-snapshot/ports
  20230818-00:15:45       INFO:   Creating /releng/scripts-snapshot/doc
  20230818-00:15:46       INFO:   Checking out https://git.FreeBSD.org//src.git (main) to /releng/scripts-snapshot/release
  [...]
  20230818-18:46:22       INFO:   Staging for ftp: 14-aarch64-ROCKPRO64-snap
  20230818-20:41:02       INFO:   Staging for ftp: 14-riscv64-GENERIC-snap
  20230818-22:54:49       INFO:   Staging for ftp: 14-amd64-GENERIC-snap

Note, it is just about 4 minutes past 00:00 UTC as of this writing, so
we are about to cross well over the 24-hour mark, and cloud provider
images have not yet even started.

I am inclined to do two things:

1) Immediately run a subsequent snapshot build to see if it takes longer
   than 24 hours (my gut tells me it will);
2) Reboot the machine with no other changes, and immediately run this
   week's snapshot builds again (not to be public to avoid confusion).

Some interactive commands are significantly slower, such as systat,
vmstat, even top.  Creating a new window in tmux is also noticeably
slow.

Is there a third option I am overlooking in trying to identify the
drastic cause of this?

Thank you in advance.

Glen