Direct Linux loading for bhyve

From: Rob Norris <robn_at_despairlabs.com>
Date: Sun, 05 May 2024 01:56:20 UTC
Hi all,

Last year I did some work on adding support to bhyve to load a Linux
kernel directly, without needing to create a disk image or configure a
bootloader. I showed a few people at the Dev Summit in Taipei in March,
and the concept was generally well received, so I'm writing this email
to describe where I'm at, where I want to take it and seek comments,
ideas and guidance on how to proceed.


The initial motivation was to be able to do the equivalent of QEMU's
-kernel, -append and -initrd options with bhyve, to boot a Linux
kernel directly. (For me, it's to use to port my kernel dev tool
"quiz"[1] to FreeBSD, though that is only tangentially related).

To do this I added a "loader" class to bhyve, and then wrote a loader
that implements the Linux x86 boot protocol.

Some links:

* Prototype:
https://github.com/robn/freebsd-src/tree/bhyve-loader-linux/usr.sbin/bhyve

* Demo run using the kernel and initrd from a Debian installer iso:
     https://asciinema.org/a/FuXehcd5MkWb7LE15s1VT2ugK


I'll describe how it's put together here.

loader.h and loader.c define a trivial struct loader, which each
loader module defines and adds to loader_set.

     struct loader {
         const char *l_name;
         int (*l_setup_memory)(struct vmctx *ctx);
         int (*l_setup_boot_cpu)(struct vmctx *ctx, struct vcpu *vcpu);
     };

     static const struct loader loader_linux = {
         .l_name = "linux",
         .l_setup_memory = loader_linux_setup_memory,
         .l_setup_boot_cpu = loader_linux_setup_boot_cpu,
     };
     LOADER_SET(loader_linux);

It's pretty straightforward: after memory is created, l_setup_memory()
is called to load whatever is wanted into it. Then, once the boot CPU is
created, l_setup_boot_cpu() is called to set initial registers and
insert anything needed to hook up the final memory map, device state or
whatever else. It's not so different to the existing bootrom support
(indeed, an early version was just setting it up as an alternate
bootrom).

The details are in amd64/loader_linux.c. For a second opinion, I wrote a
loader_multiboot2.c[2], though it's not finished and not working
properly. I suspect it's not very far away but in any case, it does show
the shape.


Apart from the normal matters of style, documentation, testing, and
other "productionising" tasks, there's at least two things to address:

* I need a way to create a VM that will be destroyed when bhyve exits,
   including if it is killed. KVM does it by binding the VM to a file
   descriptor; when the descriptor is closed, the VM is destroyed. That's
   a fairly common pattern in Linux for managing lifetimes of kernel
   resources from userspace. I'm new enough to FreeBSD to not know what
   the common pattern for that kind of thing is (pointers appreciated!).
   Regardless, this feels like it's mostly a case of plumbing.

* I don't have dedicated command line options yet. '-o loader.name=foo'
   is used to select the loader, and any other options the loader has to
   sort out by itself (eg the Linux loader knows 'loader.kernel',
   'loader.initrd' and 'loader.cmdline'). Maybe we don't _need_ dedicated
   command line options; I don't know how to decide that.


And then there's a bunch of observations around possible future areas
for improvement in both bhyve and libvmm. These aren't showstoppers for
some form of loader support landing, but they're definitely places
things can be better:

* The memory layout story doesn't seem very flexible. The Linux loader
   needs to stake out multiple different regions, but there's not really
   any help to know what's already claimed, or claim it in turn. I just
   have to pick some spots that probably aren't going to be used by any
   device mapping or similar. It's not that hard, but it seems like some
   kind of allocator concept might be useful. Not unlike the e820
   allocator perhaps, but they're not "physical" regions as such (that
   is, shouldn't be exposed to the OS in the e820 table).

* Semi-related, something that would help a lot is some ability to map a
   region of host memory directly into the guest address space. Then
   instead of copying things in, we can just mmap() and give the
   pointer to the hypervisor directly. This isn't so important for a
   loader, but as far as I can tell it's a requirement if QEMU itself
   were ever to use bhyve/libvmm for acceleration, as it wants to set up
   its memory layout directly and then just hand it to the hypervisor
   (getting QEMU running is another side project I have on the go, so
   I've thought about this a little bit).

* It does seem like this loader concept has some overlap with the
   bootrom support, and maybe bootrom should be just another kind of
   loader. But, maybe not, since a bootrom is a real device, not just
   stuff in memory.

* It's really really hard to set up the register state properly. This
   might just be a reflection of the complexity of the problem,
   especially since I'm trying to set things up so the CPU starts in
   64-bit long mode, and there's very few examples of that out there
   (even QEMU installs a tiny bootrom to have the guest do the
   transitions before bouncing into Linux). Regardless, it seems like
   helpers to assist with building the GDT, or setting segment shadow
   registers, or control registers, etc, would make this sort of thing a
   lot easier (incidentally; there is some help for this within libvmm
   just for setting up for a FreeBSD guest; at least, that seems out of
   place).


I think that's everything for now. I'm very interested in any thoughts,
opinions, guidance or complaints people have. I'm also hang around in
#bhyve on IRC, on the FreeBSD Discord, and I'll be at BSDCan later this
month if you want to chat to me about it.

Cheers,
Rob.


1. 
https://despairlabs.com/blog/posts/2024-03-04-quiz-rapid-openzfs-development/
2. 
https://github.com/robn/freebsd-src/blob/bhyve-loader-multiboot2/usr.sbin/bhyve/amd64/loader_multiboot2.c