Ethernet Drivers: Question on Sending Received Packets to the FreeBSD Network Stack

Sat Feb 5 05:54:40 UTC 2011

On Thu, 3 Feb 2011, Julian Elischer wrote:

> On 2/3/11 10:08 AM, David Somayajulu wrote:
>> Hi All, While sending the Received Ethernet Frames (non - LRO case) to the 
>> FreeBSD Network Stack via (struct ifnet *)->if_input((struct ifnet *), 
>> (struct *mbuf));
>> 
>> Is it possible to send multiple Ethernet Frames in a single invocation of 
>> the above callback function?
>> 
>> In other words should (struct *mbuf) above always correspond to a single 
>> Ethernet Frame? I am not sure if I missed something, but I gathered from a 
>> quick perusal of ether_input() in net/if_ethersubr.c, that only ONE 
>> Ethernet Frame may be sent per callback.
>
> yes only one. the linkages you see in the mbuf definition are for when you 
> are putting it into some queue (interface, socket, reassembly, etc).
>
> I had never considered passing a set of packets, but after my initial 
> scoffing thoughts I realized that it would actually be a very interesting 
> thought experiment to see if the ability to do that would be advantageous in 
> any way. I tmay be a way to reduce some sorts of overhead if using interrupt 
> mitigation.

This was discussed quite a lot at the network-related devsummit sessions a few 
years ago.  One idea that was bandied about was introducing an mbuf vector 
data structure (I'm sure at least two floated around in Perforce at some 
point, and another ended up built into the Chelsio driver I think).  The idea 
being that indirection through queues as with mbufs is quite inefficient when 
you want to pass them around in sets when the set may not be in the cache. 
Instead, vectors of mbuf pointers would be passed around, each entry being a 
chain representing a packet.

I think one reason that idea never really went anywhere was that the use cases 
were fairly artificial for anything other than link layer bridging between 
exactly two interfaces or systems with exactly one high-volume TCP connection. 
In most scenarios, packets may come in small bursts going to the same 
destination (etc), as is exploited by LRO, but not in a way that you're able 
to maintain passing around in sets that remain viable as you get above the 
link layer.  I seem to recall benchmarking one of the prototypes and finding 
that it increased the working set on memory noticeably since it effectively 
meant much more queueing was taking place, whereas our current direct dispatch 
model helped latency a great deal.  It could be that with more deferred 
dispatch in a parallel setting, it helps, but you'd definitely want to take a 
measurement-oriented approach in looking at it any further.

(For bridged ethernet filtering devices that act as a "bump in the wire", it 
might well prove a reasonable performance optimisation.  I'm not sure if the 
cxgb mvec implementation would be appropriate to this task or not.)

Robert