[Bug 277886] ZFS boot loader gives up too easily on unsupported zpool flags

From: <bugzilla-noreply_at_freebsd.org>
Date: Sat, 13 Apr 2024 15:54:44 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277886

--- Comment #7 from Warner Losh <imp@FreeBSD.org> ---
(In reply to Edward Tomasz Napierala from comment #6)
> Well of course it's not.  But the worst case we're risking here is what is currently the only case: a boot failure.

I think it's a bad idea. It will transform the boot failure from a well known
one (an error message saying it can't find loader.lua) to some random thing
that may work for a while, but then randomly stop working in the future. Or
some hang that's hard to notice, or something else entirely. It would be a
support nightmare, so I'm very close to a hard no on doing it unconditionally
because I'll be the one that has the extra work.

However, there are other options. First, we could have a built-in command that
sets a global flag to force the operation and retry. This isn't terrible to
implement, but is somewhat of a pain because we'll need special code in every
single driver to do this. And it can't work in boot1.efi, but I don't care
about that because boot1.efi is deprecated.

Second, we could prompt the user when we detect the problem whether or not to
continue anyway. I think we always have a console, though it might not be the
user's preferred console at this point (since that preference is set from the
very filesystem we're trying to read). We do have conin for EFI, even
boot1.efi.

Third, for BIOS booting (and to a lessor extent EFI) we have a command line
option we can bass in from boot0 that could force it. EFI could have an
environment variable that controls it (for those systems that don't let you set
a command line option).

Fourth, and this is another of the modify all the boot loaders, would be to try
what we do now, then test to see if we can read loader.lua (the 4th loader
won't be modified: it's feature set is frozen and this is a new feature). If we
can't read loader.lua, we know we're about to fail, so we can try again with a
global force flag set (after a brief pause to announce we're doing this and all
bets are off and we might reset or hit an assertion in the ZFS code). This I
think I like best because it's the safest one that could be automatic. Plus we
can set a kenv that communicates to an rc file to print a big, ugly warning on
boot to say we had to do this to read the ZFS pool and you got lucky: next
update or even next boot you might not be so lucky.

Of course, 'don't update the zpool unless the boot blocks support it` is the
best option.

-- 
You are receiving this mail because:
You are the assignee for the bug.