Re: Using a recovery partition to repair a broken installation of FreeBSD

From: Warner Losh <imp_at_bsdimp.com>
Date: Tue, 02 Sep 2025 19:25:59 UTC
On Tue, Sep 2, 2025 at 7:55 AM Tomoaki AOKI <junchoon@dec.sakura.ne.jp>
wrote:

> On Mon, 1 Sep 2025 21:02:45 -0600
> Warner Losh <imp@bsdimp.com> wrote:
>
> > On Mon, Sep 1, 2025 at 5:42 AM Tomoaki AOKI <junchoon@dec.sakura.ne.jp>
> > wrote:
> >
> > > On Mon, 1 Sep 2025 03:15:50 -0600
> > > Warner Losh <imp@bsdimp.com> wrote:
> > >
> > > > On Mon, Sep 1, 2025, 3:05 AM Poul-Henning Kamp <phk@phk.freebsd.dk>
> > > wrote:
> > > >
> > > > > --------
> > > > > Tomoaki AOKI writes:
> > > > >
> > > > >
> > > > > > >  > … it would be nice to have something like 'recovery
> partition',
> > > as
> > > > > > > some OSes have. or at least some tiny fail-safe feature. having
> > > remote
> > > > > > > machine in some distant datacenter, booting from a flashstick
> is
> > > > > always
> > > > > > > a problem.
> > > > >
> > > > > I thought that is what /rescue is for ?
> > > > >
> > > >
> > > > That only works if your boot loader can read it... I've thought for a
> > > > while now that maybe we should move that into a ram disk image that
> we
> > > fall
> > > > back to if the boot loader can't read anything else...
> > > >
> > > > Warner
> > >
> > > Exactly. If the loader (or bootcode to kick the loader in the
> > > partition/pool) can sanely read the partition/pool to boot from,
> > > I think /rescue is enough and no need for rescue "partition / pool".
> > >
> > > But once the partition / pool to boot is broken (including lost
> > > decryption key for encrypted partitions/drives from regular place),
> > > something others are needed.
> > >
> > > And what can be chosen to boot from BIOS/UEFI firmware depends on
> > > the implementation (some could restrict per-drive only, instead of
> > > every entry in EFI boot manager table).
> > >
> > > If BIOS/firmware allow to choose "drive" to boot, rescue "drive"
> > > is useful, if multiple physical drives are available.
> > >
> > > Yes, rescue mfsroot embedded into loader.efi would be a candidate, too,
> > > if the size of ESP allows.
> >
> >
> > Rescue is quite small. On the order of 8MB compressed. The trouble is
> that
> > the kernel is like 12MB compressed, plus we'd need a few more modules.
> > Still, we could likely get something under 25MB that's an MD image that
> we
> > could boot into, but it would have to be single user. And It's been a
> while
> > since I did that... Typically I just run /rescue/init or /rescue/sh,
> which
> > isn't a full system and still uses the system's /etc. If we customized it
> > per system, we could do better, since the kernel can be a bit smaller
> > (compressed our kernels at work are 6MB), so under 20MB could be
> possible.
> > We'd not need /boot/loader.efi in there.
>
> Oh, much smaller than I've expected!
>
> Actually, using boot1.efi (either stock or patched), users of Root on
> ZFS can have rescue UFS partition on the same drive.
> This is because it looks for /boot/loader.efi to kick from ZFS pool
> first, then, UFS. This is per-drive priority and if both are NOT found,
> boot1.efi looks for another drive with the order that UEFI firmware
> recognized. (The first to try is the drive boot1.efi itself was kicked.)
>
> This is how smh@ implemented when I requested to fix boot issue
> on UEFI boot (at the moment, loader.efi cannot be kicked directly
> by UEFI firmware and needed boot1.efi).
>

This isn't true, at least not generally. We load loader.efi in all new
installations by default. I've fixed a number of issues around this from
the past... We're not able to use it at netflix to boot off of ZFS, for
example...


> Maybe Warner would remember, before the fix, boot1.efi always looked for
> /boot/loader.efi with the order UEFI firmware recognized drives,
> thus, even if started from USB memstick for rescue, boot1.efi
> "always" kicked the first "internal" drive and cannot rescue.
> Yes, fresh installations was OK with it, as there's no /boot/loader.efi
> in any of internal drives.
>

Yea, I'm not remembering it...


> > If we could hook into the arch specific traps that cause segv, etc, we
> > could do a setjmp early and set 'safe mode' and restart.  Though that may
> > be trickier than I initially am thinking... maybe the best bet is to let
> > uefi catch that failure and have the next bootable BootXXXX environment
> on
> > the list specify a safe mode. More investigation might be needed.
> >
> > Warner
>
> Yeah, and it could be (and would actually be) implementation-specific.
> Maybe chaotic in real world and lots of quirks would be required.
>

I don't understand that part... It would be architecture specific, but why
would it be implementation specific?

Warner