netmap custom RSS and custom packet info

Luigi Rizzo rizzo at iet.unipi.it
Mon Jun 29 17:34:52 UTC 2015


On Mon, Jun 29, 2015 at 6:22 PM, Slawa Olhovchenkov <slw at zxy.spb.ru> wrote:

> On Mon, Jun 29, 2015 at 06:05:41PM +0200, Luigi Rizzo wrote:
>
> > On Mon, Jun 29, 2015 at 5:17 PM, Slawa Olhovchenkov <slw at zxy.spb.ru>
> wrote:
> >
> > > Working with netmap and modern hardware I am lacking some features:
> > >
> > > a) some spare space before packet (64/128/192/256 bytes) for
> > > application data. For example: application do some pre-analysig
> > > packet, filled structure in this space and routed packet (via NETMAP
> > > pipe) to other thread. Received thread got packet and linked
> > > inforamtion about this packet for processing w/o additional overhead.
> > >
> >
> > ​spare space in front of the packet is something we have
> > been considering for a different purpose, namely better
> > support for encapsulation/decapsulation and things like
> > vhost-net header.
>
> Adding more space (sysctl or ioctl controled may be satisfy both:
> 4-8-20 bytes for encapsulation and rest for application).
>
> > ​Note though that the annotation is transferred for free
> > only in the case of pipes or ports sharing the same memory
> > region; vale ports would have to explicitly copy the
> > extra​ bytes which is (moderately) expensive.
>
> I think this bytes don't be transfered throw VALE.
> This is only packet-processing information, like tags, opposite to
> VALE that is like packet transfered by wire.
>>


> > A quick and dirty way to support what you want is the following:
> > - in the kernel code, modify NMB(), PNMB() and the offset between
> >   the netmap_ring and the first buffer to add the extra space
> >   you want in front of the packet. You can possibly make this
> >   offset a sysctl-controlled value
> >
> > - in netmap_vale.c, make a small change to the code that copies
> >   buffers so that it includes also the space before the actual packet.
> >
> > That should be all.
>
> Do you plan to do this?
> I am don't like have permanenty private branch/patchs.
>

​possibly in the long term yes, but before doing it
i want to design it properly so that it does not
look like a custom hack.


> > > b) custom RSS. Modern NIC have RSS poorly interoperable with packet
> > > analysing: packets from same flow, but different direction placed in
> > > different queue, PPPoE encapsulated packets placed in queue 0,
> > > different tunneling don't recognised and etc. May be NETMAP can be
> > > used custom RSS hashing from loadable kernel module, provideng by
> > > user? Function frm this module can be packet analysing, tunnel
> > > removing, custom RSS hashnig with direction-independly maner, filled
> > > some structure prepended to buffer (see above) and pass this
> > > information to application.
> > >
> >
> > ​RSS is completely orthogonal to​
> >
> > ​ netmap and I strongly
> > suggest to keep it this way, using either use the NIC-specific
> > tools to control RSS or some generic mechanism
> > (on linux there is ethtool, and we should implement something
> > similar also on freebsd).
>
> This is not true RSS. This is only trick for reassigning RX packets to
> different netmap rings. All hardware avalable RSS mechanism is fully
> inacceptable for this:
>
> - don't support different encapsulation (PPPoE, GRE, GTP and etc)
> - give different rings for packet 1.2.3.4->5.6.7.8 and  5.6.7.8->1.2.3.4
>
> Producing unversal hashing/distributing mechanism is too complex. But
> using user-providing kernel module (syncing to application) may be
> acceptable?
>

> This is like ephemeral permanent NETMAP pipe between real hardware
> RX rings/driver and application visible rings.
>


this particular function
​would also need to deal with
notifications between the physical NIC and the exported
netmap rings, and i would probably leave it to userspace.

You should be able to do what you
have in mind using the
programmable forwarding function ​
​
that already exists
for VALE ports
​ (at the cost of a memory copy, which could
be avoided when/if we decide to support VALE ports that
share the same memory region hence using zero copy.​

Don't hold your breath though.

cheers
luigi​



-- 
-----------------------------------------+-------------------------------
 Prof. Luigi RIZZO, rizzo at iet.unipi.it  . Dip. di Ing. dell'Informazione
 http://www.iet.unipi.it/~luigi/        . Universita` di Pisa
 TEL      +39-050-2217533               . via Diotisalvi 2
 Mobile   +39-338-6809875               . 56122 PISA (Italy)
-----------------------------------------+-------------------------------


More information about the freebsd-net mailing list