Re: FreeBSD 13.2-STABLE can not boot from damaged mirror AND pool stuck in "resilver" state even without new devices.

From: Lev Serebryakov <lev_at_FreeBSD.org>
Date: Sun, 07 Jan 2024 20:56:59 UTC
On 07.01.2024 21:49, Lev Serebryakov wrote:

> On 07.01.2024 19:34, Warner Losh wrote:
> 
>> I must have missed it. What were the diagnostics?

  Oh, and two "nvlist inconsistency" before that vvvv

> zio_read error: 5
> zio_read error: 5
> zio_read error: 5
> ZFS: i/o error - all block copies unavailable
> ZFS: can't read MOS of pool zroot
> 
> 
>   To be honest, I thinks there is something else. Because sequence of events were (sorry, too long, but I think, tht every detail matters here):
> 
> (1) Update to 13.2 from 12.4. With installation of new gptzfsboot with gpart on both disks. It could place new /boot far away, but see (2)
> (2) Reboot, which completed, but showed that ada0 has problems
> (3) Replacement of ada0 by DC technicians, new disk is 512/4096, old disk is 512/512, pool has ashift=9
> (4) Server refuses to boot from ada1 (ada0 is empty) with diagnostics (see above)
> (5) Linux rescue system, passing 2 devices to qemu with FreeBSD (because Linux shows that ZFS is on whole disk, not on partition!).
> (6) Re-creation of GPT on ada0, start of resilver (with sub-optimal ashift!).
> (7) Interruption of resilver with reboot, because it is painfully slow under qemu.
> (8) Wipe of ada0 (at this point resilver status of pool becomes crazy) to put live FreeBSD image to boot somehow.
> (9) Many tries to cancel resilver and boot from single-disk "historical" pool on ada1, no success. I've attributed it to the strange state of pool: one component, no mirrior, but "resilvering".
> (10) Boot from small UFS partition (which replaces swap partition).
> (11) Pool on ada1 (old, live, 512/512 disk) is still "Reslivering" without any additional components (with zero speed, of course).
> (12) Prepare partitions on ada0 again, creating new pool with ashift=12, send|receive.
> (13) Removing partition table on ada1 (with old pool, ashift=9, still resilvering after many-many reboots with only one device in it).

  And pleas note: this pool on ada1 (old, live disk) was NOT upgraded after 12-STABLE. It was old, 12-STABLE "level" pool with all new features disabled.


-- 
// Lev Serebryakov