Advice on a multithreaded netisr patch?

Sun Apr 5 10:29:46 PDT 2009

Robert Watson wrote:
> 
> On Sun, 5 Apr 2009, Ivan Voras wrote:
> 
>>>> I thought this has something to deal with NIC moderation (em) but
>>>> can't really explain it. The bad performance part (not the jump) is
>>>> also visible over the loopback interface.
>>>
>>> FYI, if you want high performance, you really want a card supporting
>>> multiple input queues -- igb, cxgb, mxge, etc.  if_em-only cards are
>>> fundamentally less scalable in an SMP environment because they
>>> require input or output to occur only from one CPU at a time.
>>
>> Makes sense, but on the other hand - I see people are routing at least
>> 250,000 packets per seconds per direction with these cards, so they
>> probably aren't the bottleneck (pro/1000 pt on pci-e).
> 
> The argument is not that they are slower (although they probably are a
> bit slower), rather that they introduce serialization bottlenecks by
> requiring synchronization between CPUs in order to distribute the work. 
> Certainly some of the scalability issues in the stack are not a result
> of that, but a good number are.

I'd like to understand more. If (in netisr) I have a mbuf with headers,
is this data already transfered from the card or is it magically "not
here yet"?

In the first case, the package reception code path is not changed until
it's queued on a thread, on which it's handled in the future (or is the
influence of "other" data like timers and internal TCP reassembly
buffers so large?). In the second case, why?

> Historically, we've had a number of bottlenecks in, say, the bulk data
> receive and send paths, such as:
> 
> - Initial receipt and processing of packets on a single CPU as a result
> of a
>   single input queue from the hardware.  Addressed by using multiple input
>   queue hardware with appropriately configured drivers (generally the
> default
>   is to use multiple input queues in 7.x and 8.x for supporting hardware).

As the card and the OS can already process many packets per second for
something fairly complex as routing (http://www.tancsa.com/blast.html),
and TCP chokes swi:net at 100% of a core, isn't this indication there's
certainly more space for improvement even with a single-queue
old-fashioned NICs?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 258 bytes
Desc: OpenPGP digital signature
Url : http://lists.freebsd.org/pipermail/freebsd-net/attachments/20090405/1d0a28fb/signature.pgp