Re: FreeBSD 13.2-STABLE can not boot from damaged mirror AND pool stuck in "resilver" state even without new devices.

From: Warner Losh <imp_at_bsdimp.com>
Date: Sun, 07 Jan 2024 21:06:26 UTC
On Sun, Jan 7, 2024 at 12:01 PM Miroslav Lachman <000.fbsd@quip.cz> wrote:

> On 07/01/2024 19:34, Warner Losh wrote:
>
> > < 4294967296 sectors should be good. So these drives shouldn't see this
> > problem. the BIOS interfaces should have no trouble here.
>
> [...]
>
> > Yes. If the drives are > 2TB you lose. BIOS is not for you...  Unless
> > you make special partitions that are in the first 2TB of the drive and
> > only boot off of those. Also, if the drives are 4k, you likely lose,
> > though it's hit or miss. Those are the hard limits of the BIOS ABI.
>
> It is not always that simple math. As I wrote in my previous reply, my
> pool was unbootable in one machine but boots fine in the other. Both
> were Intel based amd64 with BIOS, not EFI. I think there are some buggy
> BIOSes where it cannot boot even on smaller pools than 2TB. (or maybe
> some improved BIOSes supporting larger boundaries than 2TB? I don't know
> in what exact position bootloader / kernel was on my 4TB pool)
>

OK. If the problem is that int13 has only 32-bits in the ABI, the math is
that simple.
The limit is 2^32 blocks, and there's no reliable provision for 4k sector
sizes (there's
some BIOSes that will do it, others that won't... it's a bit muddled
looking at the problem
reports, though we do try to support that). There's no BIOS64
implementation that
extends the int13 interfaces to do wider block sizes that I've seen... It's
just that it's
so close it's easy to gravitate to a known issue...

If other weird things are happening, then that means that we may have a type
problem that's truncating the logical block size (which the BIOS doesn't
care
about) to 32-bit (or maybe sometimes) which then leads to weird things
happening.
But... UEFI should suffer this same problem and we should hear about it a
lot
I'd think (though maybe how gptzfsboot is compiled might be the culprit,
since
that's the only thing that's confined to the gpt boot blocks that's not
common
binary code (we #include the implementation to make two different binary
things....)). It shouldn't care that the copy of /boot/loader is past the
2TB
logical limit, because the drives are smaller than 2TB and so none of their
LBAs will be > 2^32 and should all work. If that's indeed the issue, then
there's
something weird about how we build it for gptzfsloader.

The other thing it could be, though, is that if there's a resilvering,
there's some
subtle state that's confusing the simple reimplementation of ZFS reading
that's
in the boot loader. Though I'd expect to have heard about that before now.
Especially
since this would hit UEFI booting as well.

Warner