From nobody Wed Sep 03 09:28:34 2025 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4cGy3L0zwzz65njJ for ; Wed, 03 Sep 2025 09:28:46 +0000 (UTC) (envelope-from junchoon@dec.sakura.ne.jp) Received: from www121.sakura.ne.jp (www121.sakura.ne.jp [153.125.133.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4cGy3K38Sjz3FP3 for ; Wed, 03 Sep 2025 09:28:45 +0000 (UTC) (envelope-from junchoon@dec.sakura.ne.jp) Authentication-Results: mx1.freebsd.org; none Received: from kalamity.joker.local (124-18-6-240.area1c.commufa.jp [124.18.6.240]) (authenticated bits=0) by www121.sakura.ne.jp (8.18.1/8.17.1/[SAKURA-WEB]/20201212) with ESMTPA id 5839SZIZ059331; Wed, 3 Sep 2025 18:28:37 +0900 (JST) (envelope-from junchoon@dec.sakura.ne.jp) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=dec.sakura.ne.jp; s=s2405; t=1756891719; bh=2TFVV5HlxRsQ6c2yrUMbg1kguLKlrrj33ayWQaPIAFk=; h=Date:From:To:Cc:Subject:In-Reply-To:References; b=SbYQF/Zhy5KCPoGFkxp0SWi0+dI4kHvK5r2vuPM+azTMrHq8BRO4uoP+snOL6E81c IUsv4nvKx66vW9W5NDEc/pFU1Ffdd5V750FXlmeR9UA6TsviOHyqNeRDeUT8N29NIm IkGbha/gLTno8flRGH2fYO4JBp2qiL/v4Z3lV3vQ= Date: Wed, 3 Sep 2025 18:28:34 +0900 From: Tomoaki AOKI To: Warner Losh Cc: Poul-Henning Kamp , Graham Perrin , FreeBSD-CURRENT Subject: Re: Using a recovery partition to repair a broken installation of FreeBSD Message-Id: <20250903182834.a3c266576fd844dfadcda9a7@dec.sakura.ne.jp> In-Reply-To: References: <7b384ac0-9b24-43a4-bf63-012d745155a7@gmail.com> <18e1a7e9-07d8-43a2-96af-0acdab6c2920@gmail.com> <20250901175827.73ba0ea24812cebe2263811f@dec.sakura.ne.jp> <202509010904.58194iP2007318@critter.freebsd.dk> <20250901204243.6548150b14d79d2eab04ad3d@dec.sakura.ne.jp> <20250902225500.70577e08c0584754e743bac9@dec.sakura.ne.jp> Organization: Junchoon corps X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; amd64-portbld-freebsd14.3) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@FreeBSD.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:7684, ipnet:153.125.128.0/18, country:JP] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Rspamd-Queue-Id: 4cGy3K38Sjz3FP3 On Tue, 2 Sep 2025 13:25:59 -0600 Warner Losh wrote: > On Tue, Sep 2, 2025 at 7:55 AM Tomoaki AOKI > wrote: > > > On Mon, 1 Sep 2025 21:02:45 -0600 > > Warner Losh wrote: > > > > > On Mon, Sep 1, 2025 at 5:42 AM Tomoaki AOKI > > > wrote: > > > > > > > On Mon, 1 Sep 2025 03:15:50 -0600 > > > > Warner Losh wrote: > > > > > > > > > On Mon, Sep 1, 2025, 3:05 AM Poul-Henning Kamp > > > > wrote: > > > > > > > > > > > -------- > > > > > > Tomoaki AOKI writes: > > > > > > > > > > > > > > > > > > > > > … it would be nice to have something like 'recovery > > partition', > > > > as > > > > > > > > some OSes have. or at least some tiny fail-safe feature. having > > > > remote > > > > > > > > machine in some distant datacenter, booting from a flashstick > > is > > > > > > always > > > > > > > > a problem. > > > > > > > > > > > > I thought that is what /rescue is for ? > > > > > > > > > > > > > > > > That only works if your boot loader can read it... I've thought for a > > > > > while now that maybe we should move that into a ram disk image that > > we > > > > fall > > > > > back to if the boot loader can't read anything else... > > > > > > > > > > Warner > > > > > > > > Exactly. If the loader (or bootcode to kick the loader in the > > > > partition/pool) can sanely read the partition/pool to boot from, > > > > I think /rescue is enough and no need for rescue "partition / pool". > > > > > > > > But once the partition / pool to boot is broken (including lost > > > > decryption key for encrypted partitions/drives from regular place), > > > > something others are needed. > > > > > > > > And what can be chosen to boot from BIOS/UEFI firmware depends on > > > > the implementation (some could restrict per-drive only, instead of > > > > every entry in EFI boot manager table). > > > > > > > > If BIOS/firmware allow to choose "drive" to boot, rescue "drive" > > > > is useful, if multiple physical drives are available. > > > > > > > > Yes, rescue mfsroot embedded into loader.efi would be a candidate, too, > > > > if the size of ESP allows. > > > > > > > > > Rescue is quite small. On the order of 8MB compressed. The trouble is > > that > > > the kernel is like 12MB compressed, plus we'd need a few more modules. > > > Still, we could likely get something under 25MB that's an MD image that > > we > > > could boot into, but it would have to be single user. And It's been a > > while > > > since I did that... Typically I just run /rescue/init or /rescue/sh, > > which > > > isn't a full system and still uses the system's /etc. If we customized it > > > per system, we could do better, since the kernel can be a bit smaller > > > (compressed our kernels at work are 6MB), so under 20MB could be > > possible. > > > We'd not need /boot/loader.efi in there. > > > > Oh, much smaller than I've expected! > > > > Actually, using boot1.efi (either stock or patched), users of Root on > > ZFS can have rescue UFS partition on the same drive. > > This is because it looks for /boot/loader.efi to kick from ZFS pool > > first, then, UFS. This is per-drive priority and if both are NOT found, > > boot1.efi looks for another drive with the order that UEFI firmware > > recognized. (The first to try is the drive boot1.efi itself was kicked.) > > > > This is how smh@ implemented when I requested to fix boot issue > > on UEFI boot (at the moment, loader.efi cannot be kicked directly > > by UEFI firmware and needed boot1.efi). > > > > This isn't true, at least not generally. We load loader.efi in all new > installations by default. I've fixed a number of issues around this from > the past... We're not able to use it at netflix to boot off of ZFS, for > example... This is why I believe you're the best person to ask about loader. ;-) > > Maybe Warner would remember, before the fix, boot1.efi always looked for > > /boot/loader.efi with the order UEFI firmware recognized drives, > > thus, even if started from USB memstick for rescue, boot1.efi > > "always" kicked the first "internal" drive and cannot rescue. > > Yes, fresh installations was OK with it, as there's no /boot/loader.efi > > in any of internal drives. > > > > Yea, I'm not remembering it... It was late Jan., 2016. https://lists.freebsd.org/pipermail/freebsd-current/2016-January/059387.html > > > If we could hook into the arch specific traps that cause segv, etc, we > > > could do a setjmp early and set 'safe mode' and restart. Though that may > > > be trickier than I initially am thinking... maybe the best bet is to let > > > uefi catch that failure and have the next bootable BootXXXX environment > > on > > > the list specify a safe mode. More investigation might be needed. > > > > > > Warner > > > > Yeah, and it could be (and would actually be) implementation-specific. > > Maybe chaotic in real world and lots of quirks would be required. > > > > I don't understand that part... It would be architecture specific, but why > would it be implementation specific? > > Warner Even for mandatory features, some implementations that mis-understanding the spec can be implementation-specific, especially in early phase of the standard, unfortunately. Do you remember early PCI (not PCIe!) incompatibility issues? And early USB, too, IIRC. -- Tomoaki AOKI