vkernel & GSoC, some questions

Mon Mar 17 21:43:41 UTC 2008

:>     In all three cases the emulated hardware -- disk and network basically,
:>     devolves down into calling read() or write() or the real-kernel
:>     equivalent.  A hypervisor has the most work to do since it is trying to
:>     emulate a hardware interface (adding another layer).  XEN has less work
:>     to do as it is really not trying to emulate hardware.  A vkernel has
:>     even less work to do because it is running as a userland program and can
:>     simply make the appropriate system call to implement the back-end.
:
:And jails and similar have the absolute minimum..
:at the cost of making a single accessible point of failure
:(the one kernel).

    Yes, absolutely.  Jails have the greatest performance, though the 
    characterization of a single point of failure is a bit misleading.  The
    problem with a jail is that all programs running under it are directly
    accessing the real kernel and are able to exercise *ALL* code paths into
    that kernel, even many root code paths, and thus expose all the bugs
    in that kernel.   A vkernel or hypervisor use only a subset of the
    real kernel's functionality resulting in much lower exposure to potential
    kernel bugs.  While a vkernel or kernel running under a hypervisor is
    fully exposed, a failure of same does not cause the whole machine to
    fail and a recovery 'reboot' can be as short as 5 seconds.  The cost
    is performance.

    Even if you were to instrument the kernel code with full resource control
    (jailed memory use, I/O, descriptors, real-kernel memory use, etc)... even
    if you were to do that, it still doesn't solve the bug exposure issue.

    In anycase, there are only two performance bottlenecks that really
    matter for a vkernel or hypervisor:  (1) system calls from virtualized
    processes to their virtualized kernels, and (2) MMU invalid page faults.
    The I/O path is a distant third, really requiring only a co-thread or
    two for write()s to be made efficient.

    (1) and (2) are not easy problems to solve, mainly due to the need for
    the real kernel to have exclusive access to the context when doing an
    iret (a R/W shared mapping of the top of the kernel stack is a
    security hole).  I do think a read-only mapping might be doable,
    particularly for the standard syscall path which only modifies EAX and
    EDX in the critical path.  That would cut the overhead in half.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>