loader.efi architecture for replacing boot1.efi

Eric McCorkle eric at metricspace.net
Sat Dec 16 04:54:19 UTC 2017



On 12/15/2017 22:28, Warner Losh wrote:
> 
> 
> On Fri, Dec 15, 2017 at 7:05 PM, Warner Losh <imp at bsdimp.com
> <mailto:imp at bsdimp.com>> wrote:
> 
> 
> 
>     On Dec 15, 2017 6:43 PM, "Eric McCorkle" <eric at metricspace.net
>     <mailto:eric at metricspace.net>> wrote:
> 
>         On 12/15/2017 20:09, Warner Losh wrote:
> 
>         > This should be second. Uefi variables Trump all.
>         >
>         >     2) If not, then attempt to read EFI vars to determine the boot location
>         >
>         >     3) If no EFI vars are defined, and no partition was specified, fall back
>         >     to looking for an installed system on devices
>         >
>         >
>         > This is fine, so long as it is only on the device that the loader loaded
>         > from.
> 
>         It's fine if it's configurable, but there needs to be sane
>         behavior if
>         the EFI vars aren't set.
> 
> 
>     Where do we get this info for such a broken setup? Do you have
>     actual examples?
> 
>         >     4) At the very last, do the legacy (what loader.efi currently does)
>         >     behavior.
>         >
>         >
>         > This is bogus. It violates the uefi boot loader protocol. We must
>         > abandon this legacy behavior. The behavior is actively harmful since
>         > something random will boot. This has caused actual operational issues at
>         > Netflix. Guessing is really bad.
> 
>         We can't just ditch the current behavior and break everyone's
>         existing
>         install, though.  Legacy behavior should be supported at least
>         until the
>         next major release.
> 
> 
>     What useful setups does this break? Absent a real example, we
>     absolutely are breaking this. There is a real cost to doing this
>     that as the de facto maintainer of stand I'm unwilling to maintain,
>     test or commit to not breaking. The legacy behavior is broken and
>     has caused me hours of pain in production. There has been no
>     articulated use case this enables, especially since boot loader can
>     be interrupted to specify something in recovery scenarios.
> 
> 
>         >
>         >     Step (3) is done by attempting to stat /boot/loader.conf and
>         >     /boot/kernel.  First, all partitions on the same disk are
>         searched, then
>         >     all remaining partitions are searched.
>         >
>         >     This should allow mechanisms like EFI vars and
>         command-line args to work
>         >     without interference from the fallback mechanisms. 
>         However, it also
>         >     provides robustness in the face of failure modes and
>         uninitialized
>         >     systems (I personally ran into a problem a while back with
>         a linux
>         >     system, where I couldn't boot with EFI, because the EFI
>         vars weren't
>         >     set, because I couldn't set them if I couldn't boot with
>         EFI; had to use
>         >     Shell.efi to sort out the mess...)
>         >
>         >     More importantly, it provides a seamless transition from
>         the way things
>         >     are now to the way we want things to be.
>         >
>         >     Please provide comments and feedback.
>         >
>         >
>         > Please listen when I say searching all devices is actively
>         harmful. The
>         > uefi boot manager, which I'm in the process of bringing in,
>         offers a way
>         > to specifically say what you want to boot. If someone needs
>         something
>         > complicated, they must use that moving forward. Part of what
>         makes the
>         > protocol work is loaders giving up early so the next one on
>         the list can
>         > be tried.
> 
>         We also have to deal with the reality that some EFI
>         implementations are
>         adversarial.  We have to be able to deal with implementations
>         that make
>         it difficult to set EFI vars, or which mess with their values
>         (Lenovo is
>         particularly notorious for this).
> 
>         You can disable fallback mechanisms with command-line args or
>         macros or
>         whatever, but they need to be there.
> 
> 
>     No. Absent a sane use case, I refuse. Give me a reasonable use case,
>     I will reconsider.
> 
> 
> So the current behavior leads to absurd results that nobody else does,
> and that we don't do for legacy boot:
> 
> If we boot loader.efi/boot1.efi off a hard drive, and find there's no
> kernel, we'll load off cdrom or a floppy if we happen to find a kernel
> there. That's nuts. What's more, we'll load off a different device (say
> a thumb drive), which is also crazy. The last thing you want is to
> accidentally pick the thumb drive recovery kernel that happens to be in
> a USB slot when you have a primary and secondary partition on two main
> disks, but today's behavior chooses that. It's so crazy that I can see
> no benefit from supporting, testing and maintaining this. If someone
> wants to recover a system, they can do it at the boot loader prompt now
> (they couldn't before). If someone really wants to boot his crazy thing,
> we have a new way to specify it specifically w/o any ambiguity based on
> how the devices might move around.
> 
> We already support about 100 boot scenarios that are hard enough to
> test. I don't want to commit to supporting this and making it 120 or 150
> once you work out all the combinatorics. We have to trim the matrix of
> useless things.  So absent a use case that makes sense, that people are
> actually doing, I'm having a hard time justifying keeping it around as
> we transition.
> 
> Warner
> 
> P.S. On x86, we support geli/nogeli, gpt/mbr, ufs/zfs, and
> uefi/legacy/both (24 combinations). Plus we support booting off CDROM,
> netbooting, etc. For arm, and arm64 we have a similar number that are
> possible. zfs/ufs, u-boot/uefi, and mbr/gpt (plus a number of different
> u-boot boards). For mips we have a similar mix. Powerpc we support 4 or
> 6 ways. It's just too much to hope to test and ensure works. Each new
> thing has an non-trivial cost, and I see zero benefit from this one more
> thing, especially since it gets in the way of UEFI boot manager support.

Whatever happens, this needs to not break existing installs.  We can
remove probing floppy drives, fine (does anyone even HAVE those
anymore?).  CD-ROM drives, will break auto-detection when booting from a
liveDVD, but that can be mitigated by specifying loader args (I suppose
we'll need to have loader get args from the boot.config files
eventually).  But for now, loader.efi has got to work whether installed
in a boot1/loader (legacy) configuration, or installed directly to the
ESP.  Otherwise, there's going to be a lot of unhappy people out there.

As for the fallback search, it's just that: a fallback mechanism.  Its
job is to make a sane guess as to where to find the system, but
ultimately it's not doing anything the user can't do themselves.  And it
will only run if the EFI vars aren't set anyway, so it can't possibly
interfere with any of that.


More information about the freebsd-arch mailing list