PERFORCE change 147425 for review
Julian Elischer
julian at elischer.org
Fri Aug 15 17:48:33 UTC 2008
Marko Zec wrote:
> http://perforce.freebsd.org/chv.cgi?CH=147425
>
> Change 147425 by zec at zec_tpx32 on 2008/08/15 08:06:14
>
> Add an intro section to the document, clarify a few issues,
> randomly s/virtual machine/virtual environment/ or vimage or
> vnet where appropriate.
THANKYOU!
>
> Affected files ...
>
> .. //depot/projects/vimage/porting_to_vimage.txt#6 edit
>
> Differences ...
>
> ==== //depot/projects/vimage/porting_to_vimage.txt#6 (text+ko) ====
>
> @@ -6,21 +6,94 @@
> ===================
>
> Vimage is a framework in the BSD kernel which allows a co-operating module
> -to present multiple instances of itself so that it can participate
> -in a virtual machine scenario.
> +to operate on multiple independent instances of its state so that it can
> +participate in a virtual machine / virtual environment scenario.
> +
> +The implementation approach taken by the vimage framwork is a replacement
> +of selected global state variables with constructs that allow for the
> +virtualized state to be stored and resolved in appropriate instances of
> +module-specific container structures. The code operating on virtualized state
> +has to conform to a set of rules described further bellow, among other things
> +in order to allow for all the changes to be conditionally compilable, i.e.
> +permitting the virtualized code to fall back to operation on global state.
> +
> +The most visible change throughout the existing code is typically replacement
> +of direct references to global variables with macros; foo_var thus becomes
> +V_foo_var. V_foo_bar macros will resolve back to foo_bar global in default
> +kernel builds, and alternatively to some_base_pointer->_foo_bar for "options
> +VIMAGE" kernel configs. Prepending of "V_" prefixes to variable references
> +helps in visual discrimination between global and virtualized state. The
> +framework extends the sysctl infrastructure to support access to virtualized
> +state through introduction of the SYSCTL_V family of macros; those also
> +automatically fall back to their standard SYSCTL counterparts in default
> +kernel builds. Transparent kldsym(2) lookups are provided to virtualized
> +variables explicitly marked for visibility to kldsym interface, which permits
> +userland binaries such as netstat to operate unmodified on "options VIMAGE"
> +kernels, though this may have wide security implications.
> +
> +The vimage struct is currently primarily a placeholder for pointers to
> +module-specific struct instances; currently V_NET (networking), V_CPU
> +(CPU scheduling), and V_PROCG (jail-style interprocess protection) major
> +module classes are defined. Each vimage module may or may not be further
> +split into minor or submodules; the networking subsystem (vimage id V_NET;
> +struct vnet) in particular is organized in submodules such as VNET_MOD_NET
> +(mandatory shared infrastructure: routing tables, interface lists etc.);
> +VNET_MOD_INET (IPv4 state including transport protocols); VNET_MOD_INET6,
> +VNET_MOD_IPSEC, VNET_MOD_IPFW, VNET_MOD_NETGRAPH etc. The speciality of
> +VNET submodules is in that they not only provide storage for virtualized
> +data, but also enforce ordering of initialization and cleanup. Hence, not
> +all submodules must necessarily allocate private storage for their specific
> +data; they may be defined solely for to support proper initialization
> +ordering.
> +
> +Each process is associated with a vimage, and vimages currently hang off of
> +ucred-s. This relationship defines a process's administrative affinity
> +to a vimage and thus indirectly to all of its modules (NET, CPU, PROCG)
> +as well as to any submodules. All network interfaces and sockets hold
> +pointers back to their parent vnets; this relationship is obviously entirely
> +independent from proc->ucred->vimage bindings. Hence, when a process
> +opens a socket, the socket will get bound to a vnet instance hanging off of
> +proc->ucred->vimage->vnet, but once such a socket->vnet binding gets
> +established, it cannot be changed for the entire socket lifetime. Certain
> +classes of network interfaces (Ethernet in particular) can be assigned
> +from one vnet to another at any time. By definition all vnets are
> +are independent and can communicate only if they are explicitly provided
> +with communication paths; currently only netgraph can be used to establish
> +inter-vnet datapaths.
> +
> +In network traffic processing the vnet affinity is defined either by the
> +inbound interface or by the socket / pcb -> vnet binding. However, there
> +are many functions in the network stack that cannot implicitly fetch
> +the vnet context from their standard arguments. Instead of explicitly
> +extending argument lists of such functions with a struct vnet *,
> +a per-thread variable td_vnet was introduced, which can be fetched via
> +the curvnet macro (#define curvnet curthread->td_vnet). The curvnet
> +context has to be set on entry to the network stack (socket operations,
> +packet reception, or timer-driven functions) and cleared on exit. This
> +must be done via provided CURVNET_SET() / CURVNET_RESTORE() family of
> +macros, which allow for "stacking" of curvnet context setting and provide
> +additional debugging info in INVARIANTS kernel configs. In most cases
> +however a developer writing virtualized code will not have to set /
> +restore the curvnet context unless the code would include timer-driven
> +events, given that those are inherently vnet-contextless on entry.
> +
> +
> +Converting / virtualizing existing code
> +=======================================
>
> There are several steps need in virtualisation.
> +
> 1/ decide whether the module needs to be virtualised.
>
> if the module is a driver for specific hardware, it makes sense that
> there be only one instance of the driver as there is only one piece of
> physical hardware. There are changes in the networking code to allow
> - physical (or virtual) interfaces to be moved between virtual machines.
> - This generally requires NO changes to the network drivers of the classes
> + physical (or virtual) interfaces to be moved between vnets. This
> + generally requires NO changes to the network drivers of the classes
> covered (e.g. ethernet).
>
> 2/ decide if your module is part of one of the major module groups.
> - These are V_GLOBAL V_NET V_PROCG V_CPU.
> + These are currently V_NET V_PROCG V_CPU.
>
> The reader will note that the descriptions below use the acronym VNET
> a lot. The vimage system has been at this time broken into a number of
> @@ -32,11 +105,6 @@
> processors to it, but keep the saem filesystem and network setup, or
> alternatively to share processors but to have virtualised networking.
>
> - The current code has a "vnet" pointer in the thread. It could be argued
> - that it should actually be a vimage.
> -
> - [comments from Marko here]
> -
> 3/ If the module is to be virtualised, decide which attributes of the
> module should be virtualised.
>
> @@ -51,26 +119,28 @@
> achieve the behaviour required for part #2.
>
> 5/ Work out for all the code paths through the module, how the path entering
> - the module can divine which virtual machine it is on.
> + the module can divine which virtual environment it is on.
>
> Some examples:
> - * Since interfaces are all assigned to one virtual machine or
> - another, an incoming packet has a pointer to the receive interface,
> - which in turn has a pointer to the virtual machine instance.
> + * Since interfaces are all assigned to one vnet or another, an incoming
> + packet has a pointer to the receive interface, which in turn has a
> + pointer back to the vnet.
> * Similarly, on any request from outside the kernel, (direct or indirect)
> - the current thread has a way to get to the current virtual machine
> - instance (easily referable as the "curvnet" macro).
> + the current thread has a way to get to the current virtual environment
> + instance via td->ucred->vimage. For existig sockets the vnet context
> + must be used via so->so_vnet since td->ucred->vimage might change after
> + socket creation.
> * Timer initiated actions usually have a (void *) argument which points to
> some private structure for the module. It should be possible to add
> - a pointer to the appropriate virtual machine instance into whatever
> - structure that points to.
> - * Sometimes an action (timer initialted or initialted by module load or
> - unload simply has to chack all the virtual machine instances.
> - There is a macro (pair) for this which will iterate through all the
> - virtual machine instances.
> + a pointer to the appropriate module instance into whatever structure
> + that points to.
> + * Sometimes an action (timer trigerred or trigerred by module load or
> + unload simply has to check all the vimage or module instances.
> + There are macro (pairs) for this which will iterate through all the
> + VNET or VPROCG instances.
>
> This covers most of the cases, however in some cases it may still be
> - required for the module to stash away the virtual machine instance
> + required for the module to stash away the virtual environment instance
> somewhere, and make associated changes in the code.
>
> 6/ Add the code described below to the files that make up the module
> @@ -80,7 +150,7 @@
> temp. note: for module FOO add a definition for VNET_MOD_FOO in sys/vimage.h.
> Thos will eventually be dynamically assigned.
>
> -For now these instructions refer mainly to VNET and not VCPU etc.
> +For now these instructions refer mainly to VNET and not VCPU, VPROCG etc.
>
> Symbols defined in other modules that have been virtualised will have been
> moved to a module-specific virtualisation structure. It will be defined in a
> @@ -103,18 +173,19 @@
> When VIMAGE is compiled in, the macro will evaluate to an access to an
> element in a structure pointed to by a local varible.
> For this reason, it is necessary to also add, at the beginning of
> -these functions another MACRO that will instanciate this local variable
> +these functions another MACRO that will instantiate this local variable
> and point it at the correct place.
> -As an example, prior to using the "V_ifnet" structure, we must
> -add the following MACRO at the head of a code block enclosing the references.
> - INIT_VNET_NET(initial_value);
> +As an example, prior to using the "V_ifnet" structure in a program block,
> +we must add the following MACRO at the head of a code block enclosing the
> +references to set up module-specific base pointer variable:
> + INIT_VNET_NET(initial_valu);
>
> When VIMAGE is not defined, this will evaluate to nothing but when it
> IS defined, it will evaluate to:
> struct vnet_net *vnet_net = (initial_value);
>
> The initial value is usually something like "curvnet" which in turn
> -is a macro that derives the virtual machine reference from the current thread.
> +is a macro that derives the vnet affinity from the current thread.
> It could also be (m->m_ifp->if_vnet) if we were receiving an mbuf.
>
> In the case where it is just one function in a module calling
> @@ -125,17 +196,17 @@
> marked as "unused").
>
> Usually, when a packet enters the system it is carried through the processing
> -path via a single thread, and that thread will set its virtual machine
> +path via a single thread, and that thread will set its virtual environment
> reference to that indicated by the packet on picking up that new packet.
> This means that in the normal inbound processing path as well as the
> outgoing process path the current thread can be used to indicate the
> -current virtual machine. In the case of timer initiated events, best practice
> -would also be to set the current virtual machine reference to that indicated
> -calculated by whatever way that would be done, so that any functions called
> -could rely on the current thread being a good reference for the correct
> -virtual machine.
> +current virtual environment. In the case of timer initiated events, best
> +practice would also be to set the current virtual module reference to that
> +indicated calculated by whatever way that would be done, so that any functions
> +called could rely on the current thread being a good reference for the correct
> +virtual module.
>
> -When a new module is defined for virtualisation. The following
> +When a new VNET submodule is defined for virtualisation, the following
> structure defining macro is used to define it to the framework.
>
>
> @@ -150,17 +221,18 @@
> .vmi_struct_size = \
> sizeof(struct vnet_##m_name_lc), \
> .vmi_symmap = m_symmap \
> +
> The ID we allocated in the temporary first step in "Details" is
> -the first entry here. Eventually this should be automatically done
> +the first entry here; eventually this should be automatically done
> by module name. The DEPENDSON field tells us the order that modules
> -should be initialised in a new virtual machine. This may later need
> +should be initialised in a new virtual environment. This may later need
> to be changes to a list of text module names for dynamic calculation.
> -The rest of the fields are self explanatory..
> +The rest of the fields are self explanatory.
> With the exception of the symmap entry.
> The symmap allows us to intercept calls by libkvm to the
> linker when it is looking up symbols and to redirect it
> dynamically. this allows for example "netstat -r" to find the
> -routing tables for THIS virtual machine. (cute eh?)
> +routing tables for THIS virtual environment.
> (of course that won't work for core dumps). (XXX *needs thought *)
>
> As example of virtualising a dummy module named the FOO module
> @@ -194,11 +266,13 @@
> #endif /* !_FOO_VFOO_H_ */
> =========================================================
>
> -For each time the foo module is initiated for a new virtual machine,
> +For each time the foo module is initiated for a new virtual environment,
> the foo_bar structure must be initiated, so a new foo_creator and destructor
> functions are defined for the module. The Module will call these when a new
> -virtual machine is created or destroyed. The constructor must be called once
> -for the base machine when the system is booted, even when VIMAGE is not defined.
> +virtual environment is created or destroyed. The constructor must be called
> +once for the base machine when the system is booted, even when options VIMAGE
> +is not defined.
> +
> ==================== in module foo.c ======
> #include "opt_vimage.h"
> [...]
> @@ -229,7 +303,7 @@
>
> #ifdef VIMAGE
> /* If we have symbols we need to divert for libkvm
> - * then put them in here. We may net need to do anything if
> + * then put them in here. We may not need to do anything if
> * the symbols are not used by libkvm.
> */
> static struct vnet_symmap vnet_net_symmap[] = {
> @@ -239,7 +313,7 @@
> };
> /*
> * Declare our module and state that we want to be done after the
> - * loopback interface is initialised for the virtual machine.
> + * loopback interface is initialised for the virtual environment.
> */
> VNET_MOD_DECLARE(FOO, foo, vnet_foo_iattach,
> vnet_foo_idetach, LOIF, vnet_foo_symmap)
> @@ -295,7 +369,7 @@
> /* Initialize everything. */
> /* put your code here */
> #ifdef VIMAGE
> - /* This will do the work for each vortual machine. */
> + /* This will do the work for each vortual environment. */
> vnet_mod_register(&vnet_foo_modinfo);
> #else /* !VIMAGE */
> #ifdef FUTURE
> @@ -309,7 +383,7 @@
> case MOD_UNLOAD:
> /* You can't unload it because an interface may be using it. */
> /* this needs work */
> - /* Should refuse to unload if any virtual machines */
> + /* Should refuse to unload if any virtual environment */
> /* are using this still. */
> /* MARKO, fill in here */
> error = EBUSY;
More information about the p4-projects
mailing list