em interrupt storm

Scott Long scottl at samsco.org
Wed Nov 23 23:16:37 GMT 2005


Darren Pilgrim wrote:
> From: Scott Long
> 
>>>>On Tue, 2005-11-22 at 22:03 -0500, Kris Kennaway wrote:
>>>>
>>>>
>>>>>I am seeing the em driver undergoing an interrupt storm whenever the
>>>>>amr driver receives interrupts.  In this case I was running newfs on
>>>>>the amr array and em0 was not in use:
> 
> <...>
> 
>>>>>This is on both 6.0-RELEASE and 6.0-STABLE.
>>
>>This is apparently a side effect of how we process interupts, which is
>>different from Windows and Linux.  Since we mask the interrupt in the
>>APIC while the ithread runs, the Intel hardware tries to outsmart us
>>and continue delivering the interrupt via irq16.  There are been
>>rumors on ways to turn off this 'feature', but none of them seem to
>>work.  Since ithreads are integral to SMPng, and masking the APIC pins
>>in integral to making ithreads work, the solution will probably be to
>>be more aggressive in adopting MSI, and in doing filtered interrupt 
>>handlers that don't require the APIC to be masked.  Note that Solaris
>>and Darwin would likely exhibit the same problem since they handle
>>interrupts similar to us.
> 
> 
> Scott, from your message the problem is going to on any E7520 chipset,
> regardless of the NICs, disk controllers, etc. used in the system.  Does
> this problem also exist for the E7525 chipset?  I know the two are almost
> identical, but I figured it would be good to make sure.

I've directly observed it on 7501WV2 and 7520BD2 boards.  I don't know 
if it's a problem that will affect other board configurations or cousin
chipsets like the 7505 and 7525.

> 
> I ask because I have a machine with a Supermicro X6DAL-G (which uses the
> E7525) on which I got messages about interrupt storms during builds.  This
> was back before if_em was fixed, so they were shelved along with use of the
> em interface until if_em was fixed.  My apologies for not having the exact
> messages.  I don't having access to the logs.

This could be the same problem, yes.

> 
> Until this gets "fixed" in FreeBSD, what should those of us who are
> effectively stuck with this hardware do to avoid the problem?  Does the
> problem exist in RELENG_4?
> 
> 

What I've done is vacated the use of irq16 on my machines by disabling
things like usb and if_em.  This isn't ideal, of course.  The storming
doesn't usually cause a problem, but it can affect performance,
especially if USB is involved.  I only do this when I'm testing
performance, otherwise I leave everything enabled and don't worry about
it.  4.x won't see this problem since it handles interrupts in the
more traditional way, but using 4.x also has many other tradeoffs that
may or may not be worthwhile.


Scott


More information about the freebsd-current mailing list