Re: FreeBSD 13.2-STABLE can not boot from damaged mirror AND pool stuck in "resilver" state even without new devices.

From: Lev Serebryakov <lev_at_FreeBSD.org>
Date: Sun, 07 Jan 2024 20:49:24 UTC
On 07.01.2024 19:34, Warner Losh wrote:

> I must have missed it. What were the diagnostics?

zio_read error: 5
zio_read error: 5
zio_read error: 5
ZFS: i/o error - all block copies unavailable
ZFS: can't read MOS of pool zroot


  To be honest, I thinks there is something else. Because sequence of events were (sorry, too long, but I think, tht every detail matters here):

(1) Update to 13.2 from 12.4. With installation of new gptzfsboot with gpart on both disks. It could place new /boot far away, but see (2)
(2) Reboot, which completed, but showed that ada0 has problems
(3) Replacement of ada0 by DC technicians, new disk is 512/4096, old disk is 512/512, pool has ashift=9
(4) Server refuses to boot from ada1 (ada0 is empty) with diagnostics (see above)
(5) Linux rescue system, passing 2 devices to qemu with FreeBSD (because Linux shows that ZFS is on whole disk, not on partition!).
(6) Re-creation of GPT on ada0, start of resilver (with sub-optimal ashift!).
(7) Interruption of resilver with reboot, because it is painfully slow under qemu.
(8) Wipe of ada0 (at this point resilver status of pool becomes crazy) to put live FreeBSD image to boot somehow.
(9) Many tries to cancel resilver and boot from single-disk "historical" pool on ada1, no success. I've attributed it to the strange state of pool: one component, no mirrior, but "resilvering".
(10) Boot from small UFS partition (which replaces swap partition).
(11) Pool on ada1 (old, live, 512/512 disk) is still "Reslivering" without any additional components (with zero speed, of course).
(12) Prepare partitions on ada0 again, creating new pool with ashift=12, send|receive.
(13) Removing partition on ada1 (old one, ashift=9, still resilvering after many-many reboots with only one device in it).
(14) Boot from fresh ada0 pool - same errors from gptzfsboot, fail, and gptzfsboot says about OLD pool (which should not be available as GPT on ada1 was wiped out!!!!)
(15) Boot from UFS again.
(16) Adding parition of ada1 as second component of new pool, resilvering successful.
(17) Boot with gptzfsboot still fails! With brand-new ashift=12 pool! Now bootloader reports new pool name, but still fails to boot.

  You see, buildworld update could place /boot too far away. But there was one last successful boot between (1) and (3)! And state of pool on live disk ada1 was very strange: I can not cancel resilver no matter what I've tried till I zap GPT and start over.

> If people want to continue to support BIOS booting (or rather, booting using the CSM interfaces), then somebody is going to need to step up to the plate and implement a similar option in bsdinstall, bectl, freebsd-update, etc.

   I can use UEFI boot without problems, but now I'm not sure, will it work for me now.

-- 
// Lev Serebryakov