PERFORCE change 125169 for review
Marko Zec
zec at icir.org
Wed Aug 15 11:45:09 PDT 2007
On Wednesday 15 August 2007 17:50, Julian Elischer wrote:
> Marko Zec wrote:
> > http://perforce.freebsd.org/chv.cgi?CH=125169
> >
> > Change 125169 by zec at zec_tpx32 on 2007/08/15 11:28:57
> >
> > Defer dispatching of netisr handlers for mbufs which have
> > crossed a boundary between two vnets. Direct dispatching
> > in such cases could lead to various LORs, or in most
> > extreme circumstances cause the kernel stack to overflow.
> >
> > This is accomplished by the introduction of a new mbuf
> > flag, M_REMOTE_VNET, which must be set by any kernel entity
> > moving a mbuf from one vnet context to another. So far
> > only ng_eiface and ng_wormhole can operate across a
> > boundary between vnets, so update those two accordingly.
> > The flag is then evaluated in netisr_dispatch(), and if
> > set, the mbuf is queued for later processing instead of
> > direct dispatching of netisr handler.
>
> Is it not possible for unix domain sockets to do so if the file
> descriptor as an a part of the filesystem that is shared?
As of now AF_LOCAL sockets in different vnets are hidden from each other
using some existing jail magic infrastructure, so crossing a vnet
boundary using AF_LOCAL sockets could be somewhat difficult now.
However, in private communication several people have already expressed
a wish to separate the AF_LOCAL virtualization from the rest of the
networking subsystems, and I agree that this option should be provided
soon, so this is in my todo pipeline. In any case, AF_LOCAL sockets
are not affected by this change if that was a part of your question,
i.e. all AF_LOCAL communication will still be direct dispatched in
netisr_dispatch()...
> I hope soon (within a year) to have several vimages from a
> networking perspective but with a common filesystem root.
> the processes will communicate between themselves using
> Unix domain sockets. (That's what they currently do but I want to
> make them have separate routing tables etc.
Yes that's why we do need separate virtualization of AF_LOCAL and other
protocol familes...
Cheers,
Marko
> > Affected files ...
> >
> > .. //depot/projects/vimage/src/sys/net/netisr.c#6 edit
> > .. //depot/projects/vimage/src/sys/netgraph/ng_eiface.c#7 edit
> > .. //depot/projects/vimage/src/sys/netgraph/ng_wormhole.c#2 edit
> > .. //depot/projects/vimage/src/sys/sys/mbuf.h#6 edit
> >
> > Differences ...
> >
> > ==== //depot/projects/vimage/src/sys/net/netisr.c#6 (text+ko) ====
> >
> > @@ -178,8 +178,19 @@
> > * from an interface but does not guarantee ordering
> > * between multiple places in the system (e.g. IP
> > * dispatched from interfaces vs. IP queued from IPSec).
> > + *
> > + * If the kernel was compiled with options VIMAGE, also defer
> > + * dispatch of netisr handlers for mbufs that have crossed a
> > + * boundary between two vnets. Direct dispatching in such
> > + * cases could lead to various LORs, or in most extreme
> > + * circumstances cause the kernel stack to overflow.
> > */
> > +#ifndef VIMAGE
> > if (netisr_direct && (ni->ni_flags & NETISR_MPSAFE)) {
> > +#else
> > + if (netisr_direct && (ni->ni_flags & NETISR_MPSAFE) &&
> > + !(m->m_flags & M_REMOTE_VNET)) {
> > +#endif
> > isrstat.isrs_directed++;
> > /*
> > * NB: We used to drain the queue before handling
> >
> > ==== //depot/projects/vimage/src/sys/netgraph/ng_eiface.c#7
> > (text+ko) ====
> >
> > @@ -253,6 +253,12 @@
> > continue;
> > }
> >
> > +#ifdef VIMAGE
> > + /* Mark up the mbuf if crossing vnet boundary */
> > + if (ifp->if_vnet != node->nd_vnet)
> > + m->m_flags |= M_REMOTE_VNET;
> > +#endif
> > +
> > /*
> > * Send packet; if hook is not connected, mbuf will get
> > * freed.
> > @@ -542,6 +548,12 @@
> > /* Update interface stats */
> > ifp->if_ipackets++;
> >
> > +#ifdef VIMAGE
> > + /* Mark up the mbuf if crossing vnet boundary */
> > + if (ifp->if_vnet != hook->hk_node->nd_vnet)
> > + m->m_flags |= M_REMOTE_VNET;
> > +#endif
> > +
> > (*ifp->if_input)(ifp, m);
> >
> > /* Done */
> >
> > ==== //depot/projects/vimage/src/sys/netgraph/ng_wormhole.c#2
> > (text+ko) ====
> >
> > @@ -378,11 +378,14 @@
> > priv_p priv = NG_NODE_PRIVATE(NG_HOOK_NODE(hook));
> > int error = 0;
> > priv_p remote_priv = priv->remote_priv;
> > + struct mbuf *m;
> >
> > if (priv->status != NG_WORMHOLE_ACTIVE) {
> > NG_FREE_ITEM(item);
> > error = ENOTCONN;
> > } else {
> > + m = NGI_M(item);
> > + m->m_flags |= M_REMOTE_VNET;
> > CURVNET_SET_QUIET(remote_priv->vnet);
> > NG_FWD_ITEM_HOOK(error, item, remote_priv->hook);
> > CURVNET_RESTORE();
> >
> > ==== //depot/projects/vimage/src/sys/sys/mbuf.h#6 (text+ko) ====
> >
> > @@ -192,6 +192,7 @@
> > #define M_LASTFRAG 0x2000 /* packet is last fragment */
> > #define M_VLANTAG 0x10000 /* ether_vtag is valid */
> > #define M_PROMISC 0x20000 /* packet was not for us */
> > +#define M_REMOTE_VNET 0x40000 /* mbuf crossed boundary between two
> > vnets */
> >
> > /*
> > * External buffer types: identify ext_buf type.
> > @@ -214,7 +215,7 @@
> >
> > #define M_COPYFLAGS (M_PKTHDR|M_EOR|M_RDONLY|M_PROTO1|M_PROTO1|M_PR
> >OTO2|\ M_PROTO3|M_PROTO4|M_PROTO5|M_SKIP_FIREWALL|\
> > M_BCAST|M_MCAST|M_FRAG|M_FIRSTFRAG|M_LASTFRAG|\
> > - M_VLANTAG|M_PROMISC)
> > + M_VLANTAG|M_PROMISC|M_REMOTE_VNET)
> >
> > /*
> > * Flags to purge when crossing layers.
More information about the p4-projects
mailing list