An amusing comedy of errors: my FreeBSD-14 upgrade

From: Andrew Reilly <areilly_at_bigpond.net.au>
Date: Mon, 27 Nov 2023 01:10:47 UTC
[TL/DR: -14 removed pam_opie.so <http://pam_opie.so/>, so leaving those lines in pam.d/system etc prevented su/sudo; 
        zpool upgrade root drive without updating gptzfsboot boot loader prevented rebooting.]

Thought that I'd write this down, in case it helps anyone.  The upgrade from stable-13 to stable-14 that I made to my file server over the weekend was one of the bumpier upgrades in my thirty year history with FreeBSD (yes, I was there at the transition from the patchkit).  I'm happy to relate that all of the pain related here was self-inflicted, and FreeBSD itself shone through with its delightful robustness and straightforward nature.  All ends well.

My file server, for now, is a miniITX AMD Zen 1700 system with four 8T spinning rust drives in a single RaidZ Zpool, about 250G of NVME M.2 flash holding root, usr, var and swap (boot to ZFS) and a USB connected backup system also running ZFS.  I track -stable and update ports (with portmaster), user-space and kernel weekly.

The start was very simple, now that FreeBSD is on git: I just git switch'ed to the stable/14 branch, which went without a hitch.

Then I ran my usual weekly rebuild script, which got to the end without fuss, but with the usual complaint from etcupdate that there were some unresolved issues.  I should have been paying more attention to that: I was not using etcupdate correctly, and had not been since switching over from mergemaster a year or so ago.  Needless to say, misconfiguration was the start of the troubles, and they kicked in immediately: I couldn't reboot after the upgrade.  I couldn't reboot or sudo or su because my /etc/pam.d configuration still referred to pam_opie.so, because I had not noticed that being removed.  I _could_ still ssh into the system because my ssh config had disabled pam.  Didn't help though, because I was still stuck as me, and couldn't edit the config files, because of sudo (I've since rebuilt sudo to not use pam either!)

Easy enough to fix, right?  Power down and reboot into single-user mode and go from there.  Unfortunately I had ripped the graphics card out of the system some long time ago as an attempt to keep a bit of heat out of the box and had apparently lost it in a couple of intervening house moves.  Perhaps I'd donated it to the electronics recycling mob along with a box of old cables and power supplies.  Too late to go and get one Saturday afternoon, I found a store some distance away that would sell me one on Sunday morning.  With new graphics card in hand, I powered down, took the lid off the server, carefully lifted out the hard-drive cage and installed the GPU.  Plugged in monitor and keyboard and powered up.

Single user mode did the job: edited the pam.d files and was just about to reboot when I checked zpool status to see why the boot messages had said something about my main array operating in "degraded" mode.  One drive was apparently not found/attached.  On closer inspection I discovered that I'd dislocated the power supply plug when I took the cage out.  I'd fix that when I did the next power-down.  But in the mean time, zpool status had also taunted me with new features that I could enable.  So I did, on the root drive.  And power-cycled.  And stared dumbly at the boot screen telling me that it couldn't find any bootable drives, because the one that was there had an incompatible zpool version.  Aargh!  The boot loader had not been updated!  Couldn't even get to single user mode to fix it.

I downloaded the 14-release bootonly image from the FreeBSD web site and found a suitable thumb drive to put it on.  Power cycled the box again and told the boot menu to boot from the thumb drive.  There followed a great deal of gpart footling while I tried to remember just how I had the drives arranged, but in the end I found the magic incantation (gpart bootcode -p /boot/gptzfsboot -i 1 nda0) to install the new version of the boot loader.  Rebooted again, this time to the main system, rather than the thumb drive, and that worked.  ZFS resilvered the previously missing drive quicker than I could notice, and subsequent scrubs found nothing in need of fixing.

Everything is now hunky-dory.

Thanks to the always-wonderful FreeBSD team for continuing to produce a system that can be understood at sufficient detail to fairly easily dig oneself out of what might otherwise be catastrophic misadventures!