em network issues

Thu Oct 19 23:42:15 UTC 2006

On Thu, 19 Oct 2006, John Polstra wrote:

> On 19-Oct-2006 Scott Long wrote:
>> The performance measurements that Andre and I did early this year showed
>> that the INTR_FAST handler provided a very large benefit.
>
> I'm trying to understand why that's the case.  Is it because an
> INTR_FAST interrupt doesn't have to be masked and unmasked in the
> APIC?  I can't see any other reason for much of a performance
> difference in that driver.  With or without INTR_FAST, you've got
> the bulk of the work being done in a background thread -- either the
> ithread or the taskqueue thread.  It's not clear to me that it's any
> cheaper to run a task than it is to run an ithread.

It's very unlikely to be because masking in the APIC is slow.  The
APIC is fast compared with the PIC, and even with the PIC it takes a
very high interrupt rate (say 20 KHz) for the PIC overhead to become
noticeable (say 5-10%),  Such interrupt rates may occur, but if they
do you've probably already lost.

Previously I said that the difference might be due to interrupts
coalescing but that I wouldn't expect that to happen.  Now I see how
it can happen on loaded systems: the system might be so loaded that
it often doesn't get around to running the task before a new device
interrupt would occur if device interrupts weren't turned off.  The
scheduling of the task might accidentally be best or good enough.  A
task might work better than a software ithread accidentally because
it has lower priority, and similarly, a software ithread might work
better than a hardware ithread.  The lower-priority threads can also
be preempted, at least with PREEMPTION configured.  This is bad for
them but good for whatever preempts them.  Apart from this, it's _more_
expensive to run a task plus an interrupt handler (even if the interrupt
handler is fast) than to run a single interrupt handler, and more
expensive to switch between the handlers, and more expensive again if
PREEMPTION actually has much effect -- then more switches occur.

> A difference might show up if you had two or more em devices sharing
> the same IRQ.  Then they'd share one ithread, but would each get their
> own taskqueue thread.  But sharing an IRQ among multiple gigabit NICs
> would be avoided by anyone who cared about performance, so it's not a
> very interesting case.  Besides, when you first committed this
> stuff, INTR_FAST interrupts were not sharable.

Sharing an IRQ among a single gigabit NIC and other slower devices is
even less interesting :-).

It can be hard to measure performance, especially when there are a lot
of threads or a lot of fast interrupts handlers.  If the performance
benefits are due to accidental scheduling then they might vanish under
different loads.

Bruce