em network issues

Fri Oct 20 00:35:18 UTC 2006

Bruce Evans wrote:
> On Thu, 19 Oct 2006, John Polstra wrote:
> 
>> On 19-Oct-2006 Scott Long wrote:
>>> The performance measurements that Andre and I did early this year showed
>>> that the INTR_FAST handler provided a very large benefit.
>>
>> I'm trying to understand why that's the case.  Is it because an
>> INTR_FAST interrupt doesn't have to be masked and unmasked in the
>> APIC?  I can't see any other reason for much of a performance
>> difference in that driver.  With or without INTR_FAST, you've got
>> the bulk of the work being done in a background thread -- either the
>> ithread or the taskqueue thread.  It's not clear to me that it's any
>> cheaper to run a task than it is to run an ithread.
> 
> It's very unlikely to be because masking in the APIC is slow.  The
> APIC is fast compared with the PIC, and even with the PIC it takes a
> very high interrupt rate (say 20 KHz) for the PIC overhead to become
> noticeable (say 5-10%),  Such interrupt rates may occur, but if they
> do you've probably already lost.
> 
> Previously I said that the difference might be due to interrupts
> coalescing but that I wouldn't expect that to happen.  Now I see how
> it can happen on loaded systems: the system might be so loaded that
> it often doesn't get around to running the task before a new device
> interrupt would occur if device interrupts weren't turned off.  The
> scheduling of the task might accidentally be best or good enough.  A
> task might work better than a software ithread accidentally because
> it has lower priority, and similarly, a software ithread might work
> better than a hardware ithread.  The lower-priority threads can also
> be preempted, at least with PREEMPTION configured.  This is bad for
> them but good for whatever preempts them.  Apart from this, it's _more_
> expensive to run a task plus an interrupt handler (even if the interrupt
> handler is fast) than to run a single interrupt handler, and more
> expensive to switch between the handlers, and more expensive again if
> PREEMPTION actually has much effect -- then more switches occur.
> 

That's all fine and good, but the em task thread runs at the same
priority as a PI_NET ithread.  The whole taskqueue thing was just a
prototype for getting to ifilters.  I've demonstrated positive results
with it for aac, em, and mpt drivers.

Scott

>> A difference might show up if you had two or more em devices sharing
>> the same IRQ.  Then they'd share one ithread, but would each get their
>> own taskqueue thread.  But sharing an IRQ among multiple gigabit NICs
>> would be avoided by anyone who cared about performance, so it's not a
>> very interesting case.  Besides, when you first committed this
>> stuff, INTR_FAST interrupts were not sharable.
> 
> Sharing an IRQ among a single gigabit NIC and other slower devices is
> even less interesting :-).
> 
> It can be hard to measure performance, especially when there are a lot
> of threads or a lot of fast interrupts handlers.  If the performance
> benefits are due to accidental scheduling then they might vanish under
> different loads.
> 

It's easy to measure performance when you have a Smartbits.  More kpps
means more kpps.  Thanks again to Andre for making this resource
available.

Scott