FreeBSD 13.2-STABLE can not boot from damaged mirror AND pool stuck in "resilver" state even without new devices.

From: Lev Serebryakov <lev_at_FreeBSD.org>
Date: Fri, 05 Jan 2024 17:28:55 UTC
Hello!

    I have (remote) physical server with 2 SATA disks. These disks were partitioned with GPT into "freebsd-boot" (ada{0|1}p1, legacy one, not EFI), "freebsd-swap" (ada{0|1}p2) and "freebsd-zfs" (ada{0|1}p3).

    Both disks were 512/512 (it looks important).

    I have only one ZFS pool "zroot", mirror of "ada0p3" and "ada1p3".

    I have very fresh "gptzfsboot" on both "ada0p1" and "ada1p1".

    Now, ada0 failed. It was replaced by DC support with new disk, which is 512/4096.

    After that my server fails to boot, gtpzfsboot from second disk (ada1) reports several "zio_read error: 5" and

ZFS: i/o error - all block copies unavailable
ZFS: can't read MOS of pool zroot

    after that.

    I've booted to rescue Linux (unfortunately, there is NO rescue FreeBSD at Hetzner anymore), and Linux could import (degraded) pool no problem. But Linux has problems with detecting pool on partition, so I don't do nothing under Linux.

    I've checked "live" disk under Linux, though: it reads, SMART is clear, everything is Ok.

    I've booted FreeBSD 13.2 from installation ISO under qemu with physical devices as disks. Then I partitioned fresh HDD and started disk replacement in mirror. It worked, but resilver was unbearable slow. I stopped VM with FreeBSD to continue process after normal boot.

    NO LUCK. "zio_read error: 5", boot failed.

    Then I've overwrite ada0 (new disk) with FreeBSD memstick IMG and boot it - it can import pool from ada1p3 but, of course, resilver is stopped.

    I've removed all faulted components, effectivly converting mirror to "simple" device. But "zpool status" shows that there is resilver!

    And "gptzfsboot" still CAN NOT read this ZFS pool and find loader!

    Ok, I've converted swap to UFS boot form UFS. It works. It can use pool as root. But pool still is "reslivering".


    Now I have very strange situation:

  (1) I have ZFS pool with 1 device which says:

% zpool status -v zroot
   pool: zroot
  state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
         continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
   scan: resilver in progress since Fri Jan  5 19:24:07 2024
         750G scanned at 472B/s, 40.5G issued at 25B/s, 974G total
         0B resilvered, 4.16% done, no estimated completion time
config:

         NAME        STATE     READ WRITE CKSUM
         zroot       ONLINE       0     0     0
           ada1p3    ONLINE       0     0     0

errors: No known data errors
%

  (2) gtpzfsboot from very this system version can not read this pool and bot from it
  (3) kernel can use this pool as source of root (and all other) filesystems.

-- 
// Lev Serebryakov