ovs-netmap forgotten?

Harry Schmalzbauer freebsd at omnilan.de
Mon Jun 5 18:25:19 UTC 2017


Bezüglich Vincenzo Maffione's Nachricht vom 05.06.2017 16:06 (localtime):
> Hi Harry,
>   I've done some investigation on this issue (just for fun) , and I think I
> may have found the issue.
> 
> When using vlan interfaces, netmap use the emulated adapter, as the "vlan"
> driver is not netmap-enabled (and it cannot be).
> To intercept RX packets, netmap replaces the "if_input" function pointer
> field in the kernel "struct ifnet" (the struct representing a network
> interface).
> Note that you have an instance of "struct ifnet" for em0 (physical NIC),
> and a different instance for each VLAN cloned interface (e.g. "vlan100") on
> em0.
> If you put vlan100 in netmap mode, netmap will replace the if_input of
> vlan100, and not the if_input of em0. So far, this is an expected behaviour.
> 
> Unfortunately, I see in the code here
> 
> https://github.com/freebsd/freebsd/blob/master/sys/net/if_vlan.c#L1244-L1245
> 
> that when VLAN driver intercepts the RX packet coming from the underlying
> interface (e.g. em0 in our example), the em0 if_input is used rather than
> the vlan100 if_input.
> 
> In terms of code, we have
>   (*ifp->if_input)(ifv->ifv_ifp, m);
> rather than
>   (*ifv->ifv_ifp->if_input)(ifv->ifv_ifp, m);
> Since em0 if_input is not replaced, netmap does not intercept it and you
> don't see it in your application, e.g.
> 
> # pkt-gen -i vlan100 -f rx
> 
> will see nothing.
> 
> Now, I think that normally ifv->ifv_ifp->if_input == ifp->if_input, so this
> may explain why the code is written like that (to avoid the additional
> pointer dereferencing).
> This is not the case for netmap, where ifv->ifv_ifp->if_input !=
> ifp->if_input when em0 xor vlan100 are in netmap mode.
> 
> You may try to recompile the kernel with that change and see if you can see
> packets coming on vlan100 with pkt-gen.
> I recommend you always doing tests with pkt-gen before trying to use
> vale-ctl -a.

NICE :-) Thank you very much for your effort and impressive reading-only
analysis.
Maybe one has to be used to ifv ifp and companion variables, or I can't
see _the_ simplicity of the code or everybody else is geniuous...

First quick test shows you're right and this tiny diff solves a decent
share of my (ESXi-replacing) problems:

--- src/sys/net/if_vlan.c.orig  2017-06-05 17:39:27.770574000 +0200
+++ src/sys/net/if_vlan.c       2017-06-05 17:39:21.550278000 +0200
@@ -1234,7 +1234,7 @@
        if_inc_counter(ifv->ifv_ifp, IFCOUNTER_IPACKETS, 1);

        /* Pass it back through the parent's input routine. */
-       (*ifp->if_input)(ifv->ifv_ifp, m);
+       (*ifv->ifv_ifp->if_input)(ifv->ifv_ifp, m);
 }

 static int

Will do real-world tests tommorrow.

Unrelated to the vlan-netmap issue, more topic-related:
Last little (completely non-academic) test showed unfortunately that
"vtnet|virtio-net<-vale:guestif->netmapIF"
can't compete with
"vmx3f|vmxnet3<-ESXivSwitch->sameHWif".
The latter consumes no noticable CPU consumption when NFS-copying big
files via 1GbE, like on native host (which leaves the machine 99-100%
idle @108MB/s).
Running the same guest with the same task on bhyve causes ~20% CPU
utilization; @1GbE :-(

Also there was no significant difference between vale(4) and
if_bridge(4) with that workload (little IPp/s on saturated 1GbE PHY).
Most likely the lack of offloading features, and thus causing many more
interrupts in the guest than with vmxf3's TSO capability, is the cause.
Haven't done any inter-VM "real-world" tests yet, where vale(4) will
strike back...

So to achive my goal, replacing my ESXi setups, I'd need your quick help
again to port vmxnet3 ;-) /joking

Hope ptnet can help out here, at least for FreeBSD guests, but as far as
I could see, when merging netmap from HEAD to stable/11, (updated diff
applicable after r319182 was available here too:
ftp://ftp.omnilan.de/pub/FreeBSD/OmniLAN/misc/), bhyve(8) doesn't
support ptnet yet.

Is there any specific reason why ptnetmap-memdev
(https://svnweb.freebsd.org/socsvn/soc2016/vincenzo/head/usr.sbin/bhyve/pci_ptnetmap_netif.c)
hasn't been commited to HEAD?

Does anybody have an idea if there is any vmnet/vtnet companion (in
development stage) providing offloading features, reducing interrupt
wastings?

Another question, better addressed to virtualization@ but I remember
cross-posting is to avoid:
I never tried to understand why vmx3f seems to work without using
interrupts at all, as opposed to vmx(4), but maybe it is possible to do
the same for vtnet(4)?

Thanks,

-harry



More information about the freebsd-net mailing list