vkernel & GSoC, some questions
dillon at apollo.backplane.com
Mon Mar 17 21:43:41 UTC 2008
:> In all three cases the emulated hardware -- disk and network basically,
:> devolves down into calling read() or write() or the real-kernel
:> equivalent. A hypervisor has the most work to do since it is trying to
:> emulate a hardware interface (adding another layer). XEN has less work
:> to do as it is really not trying to emulate hardware. A vkernel has
:> even less work to do because it is running as a userland program and can
:> simply make the appropriate system call to implement the back-end.
:And jails and similar have the absolute minimum..
:at the cost of making a single accessible point of failure
:(the one kernel).
Yes, absolutely. Jails have the greatest performance, though the
characterization of a single point of failure is a bit misleading. The
problem with a jail is that all programs running under it are directly
accessing the real kernel and are able to exercise *ALL* code paths into
that kernel, even many root code paths, and thus expose all the bugs
in that kernel. A vkernel or hypervisor use only a subset of the
real kernel's functionality resulting in much lower exposure to potential
kernel bugs. While a vkernel or kernel running under a hypervisor is
fully exposed, a failure of same does not cause the whole machine to
fail and a recovery 'reboot' can be as short as 5 seconds. The cost
Even if you were to instrument the kernel code with full resource control
(jailed memory use, I/O, descriptors, real-kernel memory use, etc)... even
if you were to do that, it still doesn't solve the bug exposure issue.
In anycase, there are only two performance bottlenecks that really
matter for a vkernel or hypervisor: (1) system calls from virtualized
processes to their virtualized kernels, and (2) MMU invalid page faults.
The I/O path is a distant third, really requiring only a co-thread or
two for write()s to be made efficient.
(1) and (2) are not easy problems to solve, mainly due to the need for
the real kernel to have exclusive access to the context when doing an
iret (a R/W shared mapping of the top of the kernel stack is a
security hole). I do think a read-only mapping might be doable,
particularly for the standard syscall path which only modifies EAX and
EDX in the critical path. That would cut the overhead in half.
<dillon at backplane.com>
More information about the freebsd-hackers