Re: nanobsd [was Re: Cross compiling user applications for armv7]

Reply: Sulev-Madis Silber : "Re: nanobsd [was Re: Cross compiling user applications for armv7]"
Reply: Warner Losh : "Re: nanobsd [was Re: Cross compiling user applications for armv7]"
In reply to: Karl Denninger : "Re: nanobsd [was Re: Cross compiling user applications for armv7]"
Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: Mark Millard <marklmi_at_yahoo.com>
Date: Sun, 21 Sep 2025 21:04:54 UTC
On Sep 21, 2025, at 12:30, Karl Denninger <karl@denninger.net> wrote:

> On 9/21/2025 15:01, Warner Losh wrote:
>> 
>> 
>> On Sun, Sep 21, 2025 at 7:40 AM Karl Denninger <karl@denninger.net> wrote:
>> On 9/20/2025 19:19, Sulev-Madis Silber wrote:
>>> 
>>> On September 20, 2025 2:34:06 PM GMT+03:00, Karl Denninger <karl@denninger.net> wrote:
>>> ...
>>> 
>>>> There are ways to have that also be two-root-partition allowing "near-line" updates (update the other partition with the new OS code then reboot to activate it) provided you have a deterministic way to know which device the loader will boot from.  On EFI boot machines this can be problematic to obtain deterministically once the system is running..
>>>> 
>>>> 
>>> there was some discussion somewhere about boot switching troubles
>>> 
>>> unsure if even gpt helps here with it's dual part tables
>>> 
>>> in my case i replace env files in efi part to control boot switch. it's a bad hack. i use cp, sync, mv, sync, etc magic to make it more power fail resistant
>>> 
>>> i wish there's some sane way to do that. maybe loader could have changes. so you don't need to muck around with currdev & rootdev and what else. perhaps boot by ufs label?
>>> 
>>> in my case i finally settled on ufs label of rootfs-<unixtimestamp>. my approach writes full raw fs images which stay unmodified
>>> 
>>> and what about zfs?

Relative to booting zfs from a MBR+BIOS context:

QUOTE
author John Baldwin <jhb@FreeBSD.org> 2025-07-28 14:57:16 +0000
committer John Baldwin <jhb@FreeBSD.org> 2025-07-28 14:58:02 +0000

commit a3b72d89c7028a2254381e4d2b126416dee3fbb5 (patch)
tree efed1826b47c73b0abe223fb3fb99c91de27286b
parent e958bc1c13377767d9b2cf87d072d923aa3d482a (diff)

zfsboot: Remove zfsboot(8) program used to boot ZFS from MBR + BIOS

This has not worked since the import of OpenZFS in FreeBSD 13.0. Trying to fix it at this point would probably entail rearchitecting how it works (e.g. using a dedicated freebsd-boot slice to hold zfsboot). However, it's not really worth doing that at this point.
END QUOTE

# ~/fbsd-branches-containing.sh a3b72d89c7028a2254381e4d2b126416dee3fbb5
* main
  remotes/origin/HEAD -> origin/main
  remotes/origin/main
  remotes/origin/stable/15

>>> i might also need to have double pool system for resiliance
>>> i battled with this before efi too, 10y ago, in embedded. then it was fun if somewhere between uboot stages and fbsd loader / kernel, the boot order magically changes
>>> 
>>> unsure if zfs be could be used here and is it enough. is it better in embedded? zfs also has benefits like copies=3 and compression. and ability to withstand power failures. unsure about which extent but on ufs i was once cursing on 0b file. but at least on zfs the bootfs is a metadata and that's much better than file on fs
>>> 
>>> any opinions here?
>> The basic problem is that the EFI loader has its own ideas about the enumeration order of the devices on the machine, and you don't know what they'll be.  If you want a "universal" media that will boot on legacy (no EFI *possible*, which is the case for some such as pcEngines boxes) *and* will work on EFI you have a quandary.

If pcEngines require booting via MBR+BIOS, then booting ZFS for 15.0+
looks to not be an option for them, probably not for 13.0+ even.

>> I fixed the build issue in that such boxes typically can't boot GPT media either,

So 15.0+ will not support booting ZFS, presuming the
MBR+BIOS is the alternative.

>> but that is fixable because you can still have a partition layout that looks like this on MBR:
>> 1. Partition "1"
>> 2. Partition "2"
>> 3. EFI (ignored for a non-EFI box)
>> 4. "Data partition" which is then sub-partititoned into "cfg" and "data"
>> Looks like this on a USB stick when running:
>> =>      63  60125121  da0  MBR  (29G)
>>         63  11257500    1  freebsd  [active]  (5.4G)
>>   11257563  11257500    2  freebsd  (5.4G)
>>   22515063     81920    3  efi  (40M)
>>   22596983    840517    4  freebsd  (410M)
>>   23437500  36687684       - free -  (17G)
>> 
>> =>       0  11257500  da0s1  BSD  (5.4G)
>>          0        16         - free -  (8.0K)
>>         16  11257484      1  freebsd-ufs  (5.4G)
>> 
>> =>       0  11257500  da0s2  BSD  (5.4G)
>>          0        16         - free -  (8.0K)
>>         16  11257484      1  freebsd-ufs  (5.4G)
>> 
>> =>     0  840517  da0s4  BSD  (410M)
>>        0   62500      1  freebsd-ufs  (31M)
>>    62500  750000      4  freebsd-ufs  (366M)
>>   812500   28017         - free -  (14M)
>> 
>> For an MBR/CSM boot (non-EFI) you simply set the "active" partition after updating the other and that one is booted -- that works as it has always, in that it tells the system what to boot.  The same is true if you use GPT with the "bootme" flag.  The problem with EFI is that you need to know what the EFI loader will call the disk so you can set "rootdev=...s1a" or "s2a" since the EFI loader ignores the partition "active" marker, particularly if you want "one build that works even on systems with no EFI or capacity to boot a GPT disk."
>> There is a GPTBOOT.EFI that replicates the old gptboot protocol, but it's rather fragile so isn't enabled by default. It was written for the ping-pong setup where you can't rely on EFI env vars to drive the EFI boot manager, but instead mark the partitions. IMHO, though, it's really no different than setting a file in the ESP the loader reads, and the latter is more generic.  gptboot.efi, though, is a good place to look if you want to do ping-pong on GPT booted machines.
>> I have found no deterministic way to know what that will be (e.g. "disk0" is the obvious, but that makes a presumption -- there is no other media that could be enumerated.  What if there is?) once the box is booted and running.  There is nothing visible in sysctl, for example, that tells me deterministically where the loader got the running system's root from.
>> EFI variables tell you that.
>> 
>> cfee69ad-a0de-47a9-93a8-f63106f8ae99-LoaderPath
>> \EFI\FREEBSD\LOADER.EFI
>> 
>> cfee69ad-a0de-47a9-93a8-f63106f8ae99-LoaderDev
>> PciRoot(0x1)/Pci(0x1,0x1)/Pci(0x0,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)/HD(1,GPT,B05DF68B-625D-11EB-81AA-E0D55E1E73BD,0x28,0x96000)
>> 
>> You can then take the HD(...) and match it to the efimedia that geom publishes to find this. This is the loaddev from the bootloader, but not the partition that we booted off of. The loader is responsible for setting vfs.root.mountfrom to tell the kernel where to get its root from. We do this by looking at /etc/fstab on the load device for the / entry since the loader doesn't know how to translate loader name space to FreeBSD name space. The EFI loader can get at the UEFI path, which we also export in various places like geom and devinfo.
>> 
>> We could trivially add a -KernelDev and -KernelPath EFI variables to the mix.
>> 
>> For ZFS, it's just the BE. And you get out of the ping/pong hell by making all that well managed, and off-line upgradeable.
>> If you're willing to build EFI-only and ZFS, for example, then you could use the "bootfs" pool property for this and that should work as expected (beadm does this).  But on small-RAM systems ZFS is ill-advised and a lot of "embedded" applications are small-RAM.....
>> Does FreeBSD even run on a system with less than 1GB?
> It does run perfectly well on 1Gb machines, specifically Pi3s (which are aarch64)

For amd64 and aarch64, having, say, /boot/loader.conf
contain something like:

hw.physmem="512M"

is an approximate way of checking on even smaller
memory configurations without having a machine that
actually has so little RAM.

> ---<<BOOT>>---
> WARNING: Cannot find freebsd,dts-version property, cannot check DTB compliance
> Copyright (c) 1992-2023 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
>         The Regents of the University of California. All rights reserved.
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 14.3-STABLE #0 stable/14-n271913-1821af77efef-dirty: Fri Jul 11 08:45:25 EDT 2025
>     karl@NewFS.denninger.net:/work/OBJ/ARM64-14-STABLE/obj/usr/src.14-STABLE/arm64.aarch64/sys/GENERIC arm64
> FreeBSD clang version 19.1.7 (https://github.com/llvm/llvm-project.git llvmorg-19.1.7-0-gcd708029e0b2)
> VT(efifb): resolution 656x416
> module scmi already present!
> real memory  = 994041856 (947 MB)
> avail memory = 945131520 (901 MB)
> Starting CPU 1 (1)
> Starting CPU 2 (2)
> Starting CPU 3 (3)
> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
> 
> Zfs is a material overhead on a machine of this configuration particularly considering that it is a "nanobsd" environment.  I could build a zfs root filesystem, figure out how to send/receive that into the box and then set where it boots from on the pool as a means of "ping-pong" updating but I haven't tried loading zfs on these little things; my understanding is that on <4Gb machines that is unwise and a microSD card is a pretty low-performance thing as well, in addition to having a tendency to get quite-unhappy if written to a great deal in non-aligned small block transactions. (I've collected quite a few dead cards this way over the last 10ish years.)

The Design and Implementation of the FreeBSD operation System,
second edition has various related notesl, such as on page
548, 2nd bullet:

"Like all non-overwriting filesystems, ZFS operates best when
a least a quarter of its disk pool is free. Write throughput
becomes poor when the pool gets too full. By contrast, UFS
can run well to 95 percent full and acceptably to 99 percent
full."

Also, page 549:

"ZFS was designed to manage and operate enormous filesystems
easily, which it does well. Its design assumed that it would
have many fast 64-bit CPUs with large amounts of memory to
support these enormous filesystems. When these resources are
available, it works extremely well. However, it is not
designed for or well suited to run on resource-constrained
systems using 32-bit CPUs with less than 8 Gbytes of memory
and one small, nearly full disk, which is typical of many
embedded systems."

Note: I take that last paragraph to mean that even one of:

) 32-bit CPUs
) few 64-bit CPUs
) slow 64-bit CPUs
) less than 8 Gbytes of RAM
 (or insufficient RAM more generally)
) nearly full disk
 (slow storage would make that worse)

can lead to the "not well suited" status for ZFS use: For
example, generally CPU counts can not substitute for having
insufficient RAM or the like. (RAM+SWAP tradeoffs are
less obvious for ZFS contexts, given the ZFS use of
Wired.)

Also on Page 548:

"ZFS caches its data in its ARC that is not part of the
unified-memory cache managed by the virtual memory. The
result is that when mmap is used on a ZFS file, . . .
This approach provides coherency between memory-mapped and
I/O access at the expense of wasted memory due to having 2
copies of the file in memory and extra overhead caused by
the need to copy the contents between the two copies."

(sendfile also mentioned as having a similar issue.)

Such contributes to the RAM size issue.

>>> It would be nice if the EFI loader passed to the kernel where it loaded from (e.g. its idea of "rootdev" at the time it ran) which the kernel could then stash that in a sysctl-visible place.  That doesn't prevent someone from screwing it up by plugging in some other device to the box (thus potentially changing the EFI BIOS enumeration order) but so long as the physical configuration doesn't change that should be good enough.
>>> 
>> I'm not sure how that helps. It already sets vfs.mount_from which you can get to via the kenv program. But it isn't the loader's notion of diskXXX, which, honestly, in an EFI world that can be fraught.
> But in order for it to work you need to know the diskXXX it loaded from; that is, when you ping-pong you need to set, in the EFI partition, a loader.env file with:
> "rootdev=disk0s1a"
> (or s2a, or whatever) so the next time it boots the loader grabs off the right partition.
> If its ZFS then yes it can be constructed to work but if its not then it doesn't.  If the EFI loader, in the *absence* of a "rootdev" entry in loader.env (which obviously should take precedence if set there) was to look for the "active" flag for MBR partitions (or "bootme" for GPT partitions) then it would work for UFS as well, but the EFI loader (unless something has recently changed, and I don't think it has) currently ignores that and boots the first partition it finds that appears to be bootable -- which makes ping-pong not work unless you override it in loader.env within the EFI partition.
> 
> -- 
> Karl Denninger
> karl@denninger.net
> The Market Ticker
> [S/MIME encrypted email preferred]

===
Mark Millard
marklmi at yahoo.com