vkernel & GSoC, some questions
Matthew Dillon
dillon at apollo.backplane.com
Mon Mar 17 00:43:14 UTC 2008
:Finally, the way vkernels were implemented in dragonfly was *very*
:disruptive to the kernel source (lots of function renaming etc), so it
:is likely that this would also have to be completely reimplemented in a
:FreeBSD port.
:...
:Kris
Well, I don't think I would agree with your assessment but,
particularly, the way vkernels are implemented in DragonFly is NOT
in the least disruptive to kernel source. It has about 1/10 the
code pollution of FreeBSD's current jail implementation. The
implementation is just about as clean as it is possible to make it
from the point of view of code pollution.
You could try reimplementing the concepts and APIs in a FreeBSD port,
but good luck with that. The 'pollution' involved, aka the kernel
shims needed, are fairly minor:
* VM fault code detects fault in special mmap entry type and passes
control to the virtual kernel.
* Trap code tests that the fault occured in a managed VM context and
passes control to the virtual kernel.
* (real process) signal code checks that the signal occured while running
a managed VM context and switches the context back to the virtual
kernel before taking the signal (duh! gotta do that!).
Note that there was some other work related to the vkernel work, such
as signal mailboxes, but those aren't actually needed to port the vkernel,
though you do need some way to properly deal with scheduling races
without having to make signal blocking and unblocking system calls
(which make system calls made by a virtualized process even more
expensive).
No matter how you twist it, you can't avoid any of that. The added APIs
are:
* mmap supporting emulated user-accessible page tables.
This is unavoidable. There is no way a user process can control
virtualized processes without page-level control of their pages or
without page-level sharing of pages, with separate access domains,
between the virtual kernel process and the virtualized user process
running under it.
Not only does a virtual kernel need to be able to manipulate pages
within the virtualized VM context (representing a virtualized process),
but it must also be able to manipulate pages within its OWN context
to properly share pages between the virtual kernel and virtualized
processes, or it can't do things like, oh, implement mmap()ing of files
which have pages in both places, let alone implement the buffer cache.
I did have an issue with mmap() in that 32 bit ranges are not supported
by the current mmap code. i.e. I can't tell it in a single mmap()
to map a 3G chunk of memory. I did hack that... the vkernel code just
does three adjacent mmap()'s to map the emulated address space in
the VM context. Hokey but it works. That's not really a kernel
pollution issue anyway since it is in the vkernel platform code.
* syscalls to switch into and out of a managed VM context.
Kinda need to be able to control the virtualized contexts.
* syscalls to manipulate managed VM contexts.
Kinda need to be able to manipulate page-by-page mappings within
managed VM contexts.
* signal mailboxes (the only thing that could be done away with, really),
used to avoid the vkernel having to block and unblock signals.
The most complex part of the whole mess is the emulated page table
support added to mmap. I don't think there is any way to avoid it,
particularly if you intend to support SMP virtualization (which we do,
completely, even though it may lack performance). The MMU interactions
are tricky at best when one is trying to implement a virtual SMP kernel
running inside a real SMP kernel, because the real kernel MUST implement
real page tables inaccessible to the virtual kernel. Synchronizing page
table modifications between the emulated and real page tables on SMP
is *NOT* trivial but, hey, I wrote it so you guys have a working
template for all that crap now. It took something like two months
to make it work properly in a SMP environment.
Now one thing you can do, which I considered but ultimately discarded,
is to associate the managed VM context with a real kernel process
separate from the virtual kernel process. This does simplify the
signal processing somewhat and I believe it may also reduce context
switch overhead slightly. The reason I discarded it was two fold: First,
for a SMP build there are now two real processes per cpu instead of
one, making scheduling more complex. Second, the emulated page table
is not confined to the VM contexts under the virtual kernel's control,
the virtual kernel itself uses the same feature, so additional MP related
synchronization would have to occur to properly emulate the MMU and I
got a headache trying to think about how to do it.
What I strongly recommend you NOT do is try to associate each virtualized
process running under the virtual kernel with a real-kernel process. The
reason is that it is extremely wasteful of real-kernel resources and
exposes the real kernel to resource starvation originating in the virtual
kernel. My solution was to separate struct vmspace out from everything
else and give it its own API. This isn't pollution... really it is a major
clean-up and we already had partial separation due to our 'resident'
code support. It was easy and cleaned up a chunk of the kernel source
at the same time. In anycase, unless you do a 1:1 process model for the
emulated processes you need the code to swap VM spaces for a process.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the freebsd-hackers
mailing list