Direct Linux loading for bhyve
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sun, 05 May 2024 01:56:20 UTC
Hi all, Last year I did some work on adding support to bhyve to load a Linux kernel directly, without needing to create a disk image or configure a bootloader. I showed a few people at the Dev Summit in Taipei in March, and the concept was generally well received, so I'm writing this email to describe where I'm at, where I want to take it and seek comments, ideas and guidance on how to proceed. The initial motivation was to be able to do the equivalent of QEMU's -kernel, -append and -initrd options with bhyve, to boot a Linux kernel directly. (For me, it's to use to port my kernel dev tool "quiz"[1] to FreeBSD, though that is only tangentially related). To do this I added a "loader" class to bhyve, and then wrote a loader that implements the Linux x86 boot protocol. Some links: * Prototype: https://github.com/robn/freebsd-src/tree/bhyve-loader-linux/usr.sbin/bhyve * Demo run using the kernel and initrd from a Debian installer iso: https://asciinema.org/a/FuXehcd5MkWb7LE15s1VT2ugK I'll describe how it's put together here. loader.h and loader.c define a trivial struct loader, which each loader module defines and adds to loader_set. struct loader { const char *l_name; int (*l_setup_memory)(struct vmctx *ctx); int (*l_setup_boot_cpu)(struct vmctx *ctx, struct vcpu *vcpu); }; static const struct loader loader_linux = { .l_name = "linux", .l_setup_memory = loader_linux_setup_memory, .l_setup_boot_cpu = loader_linux_setup_boot_cpu, }; LOADER_SET(loader_linux); It's pretty straightforward: after memory is created, l_setup_memory() is called to load whatever is wanted into it. Then, once the boot CPU is created, l_setup_boot_cpu() is called to set initial registers and insert anything needed to hook up the final memory map, device state or whatever else. It's not so different to the existing bootrom support (indeed, an early version was just setting it up as an alternate bootrom). The details are in amd64/loader_linux.c. For a second opinion, I wrote a loader_multiboot2.c[2], though it's not finished and not working properly. I suspect it's not very far away but in any case, it does show the shape. Apart from the normal matters of style, documentation, testing, and other "productionising" tasks, there's at least two things to address: * I need a way to create a VM that will be destroyed when bhyve exits, including if it is killed. KVM does it by binding the VM to a file descriptor; when the descriptor is closed, the VM is destroyed. That's a fairly common pattern in Linux for managing lifetimes of kernel resources from userspace. I'm new enough to FreeBSD to not know what the common pattern for that kind of thing is (pointers appreciated!). Regardless, this feels like it's mostly a case of plumbing. * I don't have dedicated command line options yet. '-o loader.name=foo' is used to select the loader, and any other options the loader has to sort out by itself (eg the Linux loader knows 'loader.kernel', 'loader.initrd' and 'loader.cmdline'). Maybe we don't _need_ dedicated command line options; I don't know how to decide that. And then there's a bunch of observations around possible future areas for improvement in both bhyve and libvmm. These aren't showstoppers for some form of loader support landing, but they're definitely places things can be better: * The memory layout story doesn't seem very flexible. The Linux loader needs to stake out multiple different regions, but there's not really any help to know what's already claimed, or claim it in turn. I just have to pick some spots that probably aren't going to be used by any device mapping or similar. It's not that hard, but it seems like some kind of allocator concept might be useful. Not unlike the e820 allocator perhaps, but they're not "physical" regions as such (that is, shouldn't be exposed to the OS in the e820 table). * Semi-related, something that would help a lot is some ability to map a region of host memory directly into the guest address space. Then instead of copying things in, we can just mmap() and give the pointer to the hypervisor directly. This isn't so important for a loader, but as far as I can tell it's a requirement if QEMU itself were ever to use bhyve/libvmm for acceleration, as it wants to set up its memory layout directly and then just hand it to the hypervisor (getting QEMU running is another side project I have on the go, so I've thought about this a little bit). * It does seem like this loader concept has some overlap with the bootrom support, and maybe bootrom should be just another kind of loader. But, maybe not, since a bootrom is a real device, not just stuff in memory. * It's really really hard to set up the register state properly. This might just be a reflection of the complexity of the problem, especially since I'm trying to set things up so the CPU starts in 64-bit long mode, and there's very few examples of that out there (even QEMU installs a tiny bootrom to have the guest do the transitions before bouncing into Linux). Regardless, it seems like helpers to assist with building the GDT, or setting segment shadow registers, or control registers, etc, would make this sort of thing a lot easier (incidentally; there is some help for this within libvmm just for setting up for a FreeBSD guest; at least, that seems out of place). I think that's everything for now. I'm very interested in any thoughts, opinions, guidance or complaints people have. I'm also hang around in #bhyve on IRC, on the FreeBSD Discord, and I'll be at BSDCan later this month if you want to chat to me about it. Cheers, Rob. 1. https://despairlabs.com/blog/posts/2024-03-04-quiz-rapid-openzfs-development/ 2. https://github.com/robn/freebsd-src/blob/bhyve-loader-multiboot2/usr.sbin/bhyve/amd64/loader_multiboot2.c