Re: Using a recovery partition to repair a broken installation of FreeBSD

From: Tomoaki AOKI <junchoon_at_dec.sakura.ne.jp>
Date: Wed, 03 Sep 2025 09:28:34 UTC
On Tue, 2 Sep 2025 13:25:59 -0600
Warner Losh <imp@bsdimp.com> wrote:

> On Tue, Sep 2, 2025 at 7:55 AM Tomoaki AOKI <junchoon@dec.sakura.ne.jp>
> wrote:
> 
> > On Mon, 1 Sep 2025 21:02:45 -0600
> > Warner Losh <imp@bsdimp.com> wrote:
> >
> > > On Mon, Sep 1, 2025 at 5:42 AM Tomoaki AOKI <junchoon@dec.sakura.ne.jp>
> > > wrote:
> > >
> > > > On Mon, 1 Sep 2025 03:15:50 -0600
> > > > Warner Losh <imp@bsdimp.com> wrote:
> > > >
> > > > > On Mon, Sep 1, 2025, 3:05 AM Poul-Henning Kamp <phk@phk.freebsd.dk>
> > > > wrote:
> > > > >
> > > > > > --------
> > > > > > Tomoaki AOKI writes:
> > > > > >
> > > > > >
> > > > > > > >  > … it would be nice to have something like 'recovery
> > partition',
> > > > as
> > > > > > > > some OSes have. or at least some tiny fail-safe feature. having
> > > > remote
> > > > > > > > machine in some distant datacenter, booting from a flashstick
> > is
> > > > > > always
> > > > > > > > a problem.
> > > > > >
> > > > > > I thought that is what /rescue is for ?
> > > > > >
> > > > >
> > > > > That only works if your boot loader can read it... I've thought for a
> > > > > while now that maybe we should move that into a ram disk image that
> > we
> > > > fall
> > > > > back to if the boot loader can't read anything else...
> > > > >
> > > > > Warner
> > > >
> > > > Exactly. If the loader (or bootcode to kick the loader in the
> > > > partition/pool) can sanely read the partition/pool to boot from,
> > > > I think /rescue is enough and no need for rescue "partition / pool".
> > > >
> > > > But once the partition / pool to boot is broken (including lost
> > > > decryption key for encrypted partitions/drives from regular place),
> > > > something others are needed.
> > > >
> > > > And what can be chosen to boot from BIOS/UEFI firmware depends on
> > > > the implementation (some could restrict per-drive only, instead of
> > > > every entry in EFI boot manager table).
> > > >
> > > > If BIOS/firmware allow to choose "drive" to boot, rescue "drive"
> > > > is useful, if multiple physical drives are available.
> > > >
> > > > Yes, rescue mfsroot embedded into loader.efi would be a candidate, too,
> > > > if the size of ESP allows.
> > >
> > >
> > > Rescue is quite small. On the order of 8MB compressed. The trouble is
> > that
> > > the kernel is like 12MB compressed, plus we'd need a few more modules.
> > > Still, we could likely get something under 25MB that's an MD image that
> > we
> > > could boot into, but it would have to be single user. And It's been a
> > while
> > > since I did that... Typically I just run /rescue/init or /rescue/sh,
> > which
> > > isn't a full system and still uses the system's /etc. If we customized it
> > > per system, we could do better, since the kernel can be a bit smaller
> > > (compressed our kernels at work are 6MB), so under 20MB could be
> > possible.
> > > We'd not need /boot/loader.efi in there.
> >
> > Oh, much smaller than I've expected!
> >
> > Actually, using boot1.efi (either stock or patched), users of Root on
> > ZFS can have rescue UFS partition on the same drive.
> > This is because it looks for /boot/loader.efi to kick from ZFS pool
> > first, then, UFS. This is per-drive priority and if both are NOT found,
> > boot1.efi looks for another drive with the order that UEFI firmware
> > recognized. (The first to try is the drive boot1.efi itself was kicked.)
> >
> > This is how smh@ implemented when I requested to fix boot issue
> > on UEFI boot (at the moment, loader.efi cannot be kicked directly
> > by UEFI firmware and needed boot1.efi).
> >
> 
> This isn't true, at least not generally. We load loader.efi in all new
> installations by default. I've fixed a number of issues around this from
> the past... We're not able to use it at netflix to boot off of ZFS, for
> example...

This is why I believe you're the best person to ask about loader. ;-)


> > Maybe Warner would remember, before the fix, boot1.efi always looked for
> > /boot/loader.efi with the order UEFI firmware recognized drives,
> > thus, even if started from USB memstick for rescue, boot1.efi
> > "always" kicked the first "internal" drive and cannot rescue.
> > Yes, fresh installations was OK with it, as there's no /boot/loader.efi
> > in any of internal drives.
> >
> 
> Yea, I'm not remembering it...

It was late Jan., 2016.

  https://lists.freebsd.org/pipermail/freebsd-current/2016-January/059387.html

> > > If we could hook into the arch specific traps that cause segv, etc, we
> > > could do a setjmp early and set 'safe mode' and restart.  Though that may
> > > be trickier than I initially am thinking... maybe the best bet is to let
> > > uefi catch that failure and have the next bootable BootXXXX environment
> > on
> > > the list specify a safe mode. More investigation might be needed.
> > >
> > > Warner
> >
> > Yeah, and it could be (and would actually be) implementation-specific.
> > Maybe chaotic in real world and lots of quirks would be required.
> >
> 
> I don't understand that part... It would be architecture specific, but why
> would it be implementation specific?
> 
> Warner

Even for mandatory features, some implementations that mis-understanding
the spec can be implementation-specific, especially in early phase of
the standard, unfortunately. Do you remember early PCI (not PCIe!)
incompatibility issues? And early USB, too, IIRC.


-- 
Tomoaki AOKI    <junchoon@dec.sakura.ne.jp>