802.1Q vlan performance.

Gleb Smirnoff glebius at FreeBSD.org
Fri Aug 25 10:11:00 UTC 2006


  Ian,

  big thanks for your testing and feedback made. When I made this
option, I have tested it on a small number for VLANs and expected
it to be a nop. However, it was a regression. That's why I decided
that on moderate number of vlans it would be a regression.

Probably on modern hardware with bigger CPU caches it isn't. I do
most of my performance testing on quad PIII.

On Fri, Aug 25, 2006 at 09:37:46AM +0200, Ian FREISLICH wrote:
I> While doing some experimentation and work on ipfw to see where I
I> could improve performance for our virtualised firewall I came across
I> the following comment in sys/net/if_vlan.c:
I> 
I>  * The VLAN_ARRAY substitutes the dynamic hash with a static array
I>  * with 4096 entries. In theory this can give a boots(sic) in processing,
I>  * however on practice it does not. Probably this is because array
I>  * is too big to fit into CPU cache.
I> 
I> Being curious and having determined the main throughput bottleneck
I> to be the vlan driver, I thought that I'd test the assertion.  I
I> have have 506 vlans on this machine.
I> 
I> With VLAN_ARRAY unset, ipfw disabled, fastforwarding enabled,
I> vlanhwtag enabled on the interface, the fastest forwarding rate I
I> could get was 278kpps (This was a steady decrease from 440kpps with
I> 24 vlans linearly proportional to the number of vlans).
I> 
I> With exactly the same configuration, but the vlan driver compiled
I> with VLAN_ARRAY defined, the forwarding rate of the system is back
I> at 440kpps.
I> The testbed looks like this:
I> 
I> |pkt gen |            | router |                    | pkt rec  |
I> | host   |vlan2 vlan2 |        |vlan1002   vlan1002 |  host    |
I> |netperf |----------->|        |------------------->| netserver|
I> |        |em0     em0 |        |em1             em0 |          |
I> 
I> The router has vlan2 to vlan264 and vlan1002 through vlan1264 in
I> 22 blocks of 23 vlan groups (a consequence of 24 port switches to
I> to tag/untag for customers).  The pkt gen and recieve host both
I> have 253 vlans.
I> 
I> Can anyone suggest a good reason not to turn this option on by
I> default.  It looks to me like it dramatically improves performance.

As said before by Andrew. It consumes memory. And looks like a
regression on a system with small number of vlans.

However, after your email I see that we need to document this
option in vlan(4) and encourage people to try it, when they
are building a system with a huge number of vlans.

And here are some more performance thoughts on vlan(4) driver. When we
are processing an incoming VLAN tagged frame, we need either hash or
the array to determine which VLAN does this frame belong to. When we
are sending a VLAN frame outwards, we don't need this lookup. I've made
some tests and it looks like that the performance decrease that is
observed between bare Ethernet interface and vlan(4) interface, is
mostly caused by the transmit part. The packet is put twice on interface
queues. I hope, this will be optimized after Robert Watson finishes his
if_start_mbuf work.

-- 
Totus tuus, Glebius.
GLEBIUS-RIPN GLEB-RIPE


More information about the freebsd-current mailing list