warning of pending commit attempt.
zec at icir.org
Thu Feb 28 03:08:34 UTC 2008
On Wednesday 27 February 2008 00:24:26 Kris Kennaway wrote:
> Julian Elischer wrote:
> > Kris Kennaway wrote:
> >> Julian Elischer wrote:
> >>> Andre Oppermann wrote:
> >>>> Brooks Davis wrote:
> >>>>> On Mon, Feb 25, 2008 at 08:44:56PM -0800, Julian Elischer wrote:
> >>>>>> At some stage in the next few weeks I will be trying to commit
> >>>>>> Marco Zec's vimage code to -current. (only 'trying' not
> >>>>>> for technical reasons, but political).
> >>>> ...
> >>>>>> Why now?
> >>>>>> The code is in a shape where teh compiled out version of hte
> >>>>>> system is stable. In the compiled in version, it is functional
> >>>>>> enough to provide nearly all of what people want. It needs
> >>>>>> people with other interests to adapt it to their purposes and
> >>>>>> use it so that it can become a solid product for future
> >>>>>> releases.
> >>>>> The website has a snapshot with a date over a month old and
> >>>>> many comments about unstable interfaces. I've seen zero
> >>>>> reports of substantial testing...
> >>>> What about locking and SMP scalability? Any new choke points?
> >>> not that I've seen.
> >> That's a less than resounding endorsement :)
> > do the 10Gb ethernet adapters have any major problems?
> > are you willing to answer "no"?
> > should we then rip them from the tree?
> Those are small, isolated components, so hardly the same thing as a
> major architectural change that touches every part of the protocol
> But if someone came along and said "I am going to replace the 10ge
> drivers, but I dunno how well they perform" I'd say precisely the
> same thing.
> Presumably someone (if not you, then Marko) has enough of a grasp of
> the architectural changes being proposed to comment about what
> changes (if any) were made to synchronisation models, and whether
> there are new sources of performance overhead introduced.
> That person can answer Andre's question.
OK first my appologies to everybody for being late in jumping into this
thread... I'll attempt to address a few questions rised so far in a
random order, but SMP scalability definitely tops the list...
I think it's safe to assume that network stack instances / vimages will
have lifetime frequencies similar to those of jails, i.e. once they get
instantiated, in typical applications vimages would remain static over
extended periods of time, rather than created and teared off thousands
of times per second like TCP sessions or sockets in general. Hence,
synchronizing access to global vimage or vnet lists can be probably
accomplished using rmlocks which are essentially free for read-only
consumers. The current code in p4 still uses a handcrafted shared /
exclusive refcounted locking scheme with refcounts protected by a
spinlock, since in 7.0 we don't have rmlocks yet, but I'll try
converting those to rmlocks in the "official" p4 vimage branch which is
Another thing to note is that the frequency of read-only iterations over
vnets is also quite low - mostly this needs to be done only in
slowtimo(), fasttimo() and drain() networking handlers, i.e. only a
couple of times per second. All iteration points are easy to fgrep for
in the code given that they are always implemented using VNET_ITERLOOP
macros, which simply vanish away when the kernel is compiled without
options VIMAGE. But most importantly on the performance critical
datapaths (i.e. socket - TCP - IP - link layer - device drivers, and
vice versa) no additional synchronization points / bottlenecks were
introduced. In fact, the framework opens up the possibility to
replicate some of the existing choked locks over multiple vnets,
potentially reducing contention in cases where load would be evenly
spread over multiple vimages / vnets.
Other people have asked about vimages and jails: yes it is possible to
run multiple jails inside a vimage / vnet, with the original semantics
of jails completely preserved.
Non-developers accessing the code: after freebsd.org's p4 to anoncvs
autosyncer died last summer I've tried posting source tarballs every
few weeks on the project's somewhat obscure web site (that Julian has
advertised every now and then on this list): http://imunes.net/virtnet/
I've just dumped a diff against -HEAD there, and will post new tarballs
in a few minutes as well.
Impact of the changes on device drivers: in general no changes were
needed at the device driver layer, as drivers do not need to be aware
that they are running on a virtualized kernel. Each NICs is logically
attached to one and only one network stack instance at a time, and it
receives data from upper layers and feeds the upper layers with mbufs
in exactly the same manner as it does on the standard kernel. It is
the link layer that demultiplexes the incoming traffic to the
appropriate stack instance...
Overall, there's a lot of cleanup and possibly restructuring work left
to be done on the vimage code in p4, with documenting the new
interfaces probably being the top priority. I'm glad to see such a
considerable amount of (sudden) interest for pushing this code into the
main tree, so now being smoked out of my rathole I'll be happy to work
with Julian and other folks to bring the vimage code closer to CVS and
help maintaining it one way or another once it hopefully gets there, be
it weeks or months until we reach that point - the sooner the better of
More information about the freebsd-current