ovs-netmap forgotten?

Vincenzo Maffione v.maffione at gmail.com
Tue Jun 6 09:45:36 UTC 2017


2017-06-05 20:25 GMT+02:00 Harry Schmalzbauer <freebsd at omnilan.de>:

> Bezüglich Vincenzo Maffione's Nachricht vom 05.06.2017 16:06 (localtime):
> > Hi Harry,
> >   I've done some investigation on this issue (just for fun) , and I
> think I
> > may have found the issue.
> >
> > When using vlan interfaces, netmap use the emulated adapter, as the
> "vlan"
> > driver is not netmap-enabled (and it cannot be).
> > To intercept RX packets, netmap replaces the "if_input" function pointer
> > field in the kernel "struct ifnet" (the struct representing a network
> > interface).
> > Note that you have an instance of "struct ifnet" for em0 (physical NIC),
> > and a different instance for each VLAN cloned interface (e.g. "vlan100")
> on
> > em0.
> > If you put vlan100 in netmap mode, netmap will replace the if_input of
> > vlan100, and not the if_input of em0. So far, this is an expected
> behaviour.
> >
> > Unfortunately, I see in the code here
> >
> > https://github.com/freebsd/freebsd/blob/master/sys/net/
> if_vlan.c#L1244-L1245
> >
> > that when VLAN driver intercepts the RX packet coming from the underlying
> > interface (e.g. em0 in our example), the em0 if_input is used rather than
> > the vlan100 if_input.
> >
> > In terms of code, we have
> >   (*ifp->if_input)(ifv->ifv_ifp, m);
> > rather than
> >   (*ifv->ifv_ifp->if_input)(ifv->ifv_ifp, m);
> > Since em0 if_input is not replaced, netmap does not intercept it and you
> > don't see it in your application, e.g.
> >
> > # pkt-gen -i vlan100 -f rx
> >
> > will see nothing.
> >
> > Now, I think that normally ifv->ifv_ifp->if_input == ifp->if_input, so
> this
> > may explain why the code is written like that (to avoid the additional
> > pointer dereferencing).
> > This is not the case for netmap, where ifv->ifv_ifp->if_input !=
> > ifp->if_input when em0 xor vlan100 are in netmap mode.
> >
> > You may try to recompile the kernel with that change and see if you can
> see
> > packets coming on vlan100 with pkt-gen.
> > I recommend you always doing tests with pkt-gen before trying to use
> > vale-ctl -a.
>
> NICE :-) Thank you very much for your effort and impressive reading-only
> analysis.
> Maybe one has to be used to ifv ifp and companion variables, or I can't
> see _the_ simplicity of the code or everybody else is geniuous...
>
> First quick test shows you're right and this tiny diff solves a decent
> share of my (ESXi-replacing) problems:
>
> --- src/sys/net/if_vlan.c.orig  2017-06-05 17:39:27.770574000 +0200
> +++ src/sys/net/if_vlan.c       2017-06-05 17:39:21.550278000 +0200
> @@ -1234,7 +1234,7 @@
>         if_inc_counter(ifv->ifv_ifp, IFCOUNTER_IPACKETS, 1);
>
>         /* Pass it back through the parent's input routine. */
> -       (*ifp->if_input)(ifv->ifv_ifp, m);
> +       (*ifv->ifv_ifp->if_input)(ifv->ifv_ifp, m);
>  }
>
>  static int
>
> Will do real-world tests tommorrow.
>

We may ask the VLAN developers whether that ifp->if_input is really
necessary or we can replace it with ifv->ifv_ifp->if_input.


>
> Unrelated to the vlan-netmap issue, more topic-related:
> Last little (completely non-academic) test showed unfortunately that
> "vtnet|virtio-net<-vale:guestif->netmapIF"
> can't compete with
> "vmx3f|vmxnet3<-ESXivSwitch->sameHWif".
> The latter consumes no noticable CPU consumption when NFS-copying big
> files via 1GbE, like on native host (which leaves the machine 99-100%
> idle @108MB/s).
> Running the same guest with the same task on bhyve causes ~20% CPU
> utilization; @1GbE :-(


Yes, because of the offloadings on the physical NIC and offloading support
in the vmx*.


>


> Also there was no significant difference between vale(4) and
> if_bridge(4) with that workload (little IPp/s on saturated 1GbE PHY).
> Most likely the lack of offloading features, and thus causing many more
> interrupts in the guest than with vmxf3's TSO capability, is the cause.
> Haven't done any inter-VM "real-world" tests yet, where vale(4) will
> strike back...
>

Correct. if_tap does not support offloadings at all. ptnet/VALE supports
them, but there is no support for netmap physical ports.


>
> So to achive my goal, replacing my ESXi setups, I'd need your quick help
> again to port vmxnet3 ;-) /joking
>
> Hope ptnet can help out here, at least for FreeBSD guests, but as far as
> I could see, when merging netmap from HEAD to stable/11, (updated diff
> applicable after r319182 was available here too:
> ftp://ftp.omnilan.de/pub/FreeBSD/OmniLAN/misc/), bhyve(8) doesn't
> support ptnet yet.


ptnet driver is already in HEAD.
Support for bhyve is not yet in HEAD, but available here

https://github.com/vmaffione/freebsd/tree/ptnet-head

in the ptnet-head branch.



> Is there any specific reason why ptnetmap-memdev
> (https://svnweb.freebsd.org/socsvn/soc2016/vincenzo/head/
> usr.sbin/bhyve/pci_ptnetmap_netif.c)
> hasn't been commited to HEAD?
>

That's a very good question. bhyve code for ptnet has been ready for a
year, but I'm still waiting for the bhyve maintainers to commit it. I'll
raise the issue again at BSDCan over the week-end (
https://www.bsdcan.org/2017/schedule/events/814.en.html). I hope I'll find
people willing to commit this!



>
> Does anybody have an idea if there is any vmnet/vtnet companion (in
> development stage) providing offloading features, reducing interrupt
> wastings?
>
> Another question, better addressed to virtualization@ but I remember
> cross-posting is to avoid:
> I never tried to understand why vmx3f seems to work without using
> interrupts at all, as opposed to vmx(4), but maybe it is possible to do
> the same for vtnet(4)?
>

The only way to avoid interrupts at all is to do busy waiting or polling,
but nobody does that for general purpose networking because you waste CPU
or artificially increase latency.
So vmx* does use interrupts.
The way to go to optimize the TCP performance between a VM and the external
physical network is to follow the QEMU virtio-net + vhost-net approach on
Linux (
http://blog.vmsplice.net/2011/09/qemu-internals-vhost-architecture.html),
which is similar to what ptnet does.
However, offloading support if if_tap is also needed (Linux does that).

Cheers,
  Vincenzo


> Thanks,
>
> -harry
>
>


-- 
Vincenzo Maffione


More information about the freebsd-net mailing list