Re: Using a recovery partition to repair a broken installation of FreeBSD
Date: Tue, 02 Sep 2025 19:25:59 UTC
On Tue, Sep 2, 2025 at 7:55 AM Tomoaki AOKI <junchoon@dec.sakura.ne.jp> wrote: > On Mon, 1 Sep 2025 21:02:45 -0600 > Warner Losh <imp@bsdimp.com> wrote: > > > On Mon, Sep 1, 2025 at 5:42 AM Tomoaki AOKI <junchoon@dec.sakura.ne.jp> > > wrote: > > > > > On Mon, 1 Sep 2025 03:15:50 -0600 > > > Warner Losh <imp@bsdimp.com> wrote: > > > > > > > On Mon, Sep 1, 2025, 3:05 AM Poul-Henning Kamp <phk@phk.freebsd.dk> > > > wrote: > > > > > > > > > -------- > > > > > Tomoaki AOKI writes: > > > > > > > > > > > > > > > > > > … it would be nice to have something like 'recovery > partition', > > > as > > > > > > > some OSes have. or at least some tiny fail-safe feature. having > > > remote > > > > > > > machine in some distant datacenter, booting from a flashstick > is > > > > > always > > > > > > > a problem. > > > > > > > > > > I thought that is what /rescue is for ? > > > > > > > > > > > > > That only works if your boot loader can read it... I've thought for a > > > > while now that maybe we should move that into a ram disk image that > we > > > fall > > > > back to if the boot loader can't read anything else... > > > > > > > > Warner > > > > > > Exactly. If the loader (or bootcode to kick the loader in the > > > partition/pool) can sanely read the partition/pool to boot from, > > > I think /rescue is enough and no need for rescue "partition / pool". > > > > > > But once the partition / pool to boot is broken (including lost > > > decryption key for encrypted partitions/drives from regular place), > > > something others are needed. > > > > > > And what can be chosen to boot from BIOS/UEFI firmware depends on > > > the implementation (some could restrict per-drive only, instead of > > > every entry in EFI boot manager table). > > > > > > If BIOS/firmware allow to choose "drive" to boot, rescue "drive" > > > is useful, if multiple physical drives are available. > > > > > > Yes, rescue mfsroot embedded into loader.efi would be a candidate, too, > > > if the size of ESP allows. > > > > > > Rescue is quite small. On the order of 8MB compressed. The trouble is > that > > the kernel is like 12MB compressed, plus we'd need a few more modules. > > Still, we could likely get something under 25MB that's an MD image that > we > > could boot into, but it would have to be single user. And It's been a > while > > since I did that... Typically I just run /rescue/init or /rescue/sh, > which > > isn't a full system and still uses the system's /etc. If we customized it > > per system, we could do better, since the kernel can be a bit smaller > > (compressed our kernels at work are 6MB), so under 20MB could be > possible. > > We'd not need /boot/loader.efi in there. > > Oh, much smaller than I've expected! > > Actually, using boot1.efi (either stock or patched), users of Root on > ZFS can have rescue UFS partition on the same drive. > This is because it looks for /boot/loader.efi to kick from ZFS pool > first, then, UFS. This is per-drive priority and if both are NOT found, > boot1.efi looks for another drive with the order that UEFI firmware > recognized. (The first to try is the drive boot1.efi itself was kicked.) > > This is how smh@ implemented when I requested to fix boot issue > on UEFI boot (at the moment, loader.efi cannot be kicked directly > by UEFI firmware and needed boot1.efi). > This isn't true, at least not generally. We load loader.efi in all new installations by default. I've fixed a number of issues around this from the past... We're not able to use it at netflix to boot off of ZFS, for example... > Maybe Warner would remember, before the fix, boot1.efi always looked for > /boot/loader.efi with the order UEFI firmware recognized drives, > thus, even if started from USB memstick for rescue, boot1.efi > "always" kicked the first "internal" drive and cannot rescue. > Yes, fresh installations was OK with it, as there's no /boot/loader.efi > in any of internal drives. > Yea, I'm not remembering it... > > If we could hook into the arch specific traps that cause segv, etc, we > > could do a setjmp early and set 'safe mode' and restart. Though that may > > be trickier than I initially am thinking... maybe the best bet is to let > > uefi catch that failure and have the next bootable BootXXXX environment > on > > the list specify a safe mode. More investigation might be needed. > > > > Warner > > Yeah, and it could be (and would actually be) implementation-specific. > Maybe chaotic in real world and lots of quirks would be required. > I don't understand that part... It would be architecture specific, but why would it be implementation specific? Warner