Interrupt storm when disconnecting sata drives in 7.0-RC2 and 6.3

W M Griffith griffith.wm at gmail.com
Wed Feb 13 02:34:11 UTC 2008


I've tested both 6.3 and 7.0-RC2, amd64 architecture, and have found the
occurrence of an interrupt storm using an ASUS A8N-SLI Premium (nForce 4)
motherboard with an AMD Opteron 175.

Relevant dmesg output:
atapci0: <nVidia nForce CK804 UDMA133 controller> port
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 6.0 on pci0
ata0: <ATA channel 0> on atapci0
ata1: <ATA channel 1> on atapci0
atapci1: <nVidia nForce CK804 SATA300 controller> port
0x9f0-0x9f7,0xbf0-0xbf3,0x970-0x977,0xb70-0xb73,0xd800-0xd80f mem
0xd3002000-0xd3002fff irq 21 at device 7.0 on pci0
ata2: <ATA channel 0> on atapci1
ata3: <ATA channel 1> on atapci1
atapci2: <nVidia nForce CK804 SATA300 controller> port
0x9e0-0x9e7,0xbe0-0xbe3,0x960-0x967,0xb60-0xb63,0xc400-0xc40f mem
0xd3001000-0xd3001fff irq 22 at device 8.0 on pci0
ata4: <ATA channel 0> on atapci2
ata5: <ATA channel 1> on atapci2
atapci3: <SiI SiI 3114 SATA150 controller> port
0x9000-0x9007,0x9400-0x9403,0x9800-0x9807,0x9c00-0x9c03,0xa000-0xa00f mem
0xd2000000-0xd20003ff irq 19 at device 10.0 on pci5
ata6: <ATA channel 0> on atapci3
ata7: <ATA channel 1> on atapci3
ata8: <ATA channel 2> on atapci3
ata9: <ATA channel 3> on atapci3

Also, there are lines of the form "ata0: [ITHREAD]" for all of ata0, ata1,
atapci1, ata2, ata3, atapci2, ata4, ata5, atapci3, ata6, ata7, ata8, and
ata9.  There is not such a line for atapci0, I assume by design.

Using a Kingwin KF-1000-BK SATA hot swap rack, I can connect and reconnect
an SATA hard drive without powering down.  This is technically equivalent to
using esata.  I start with a hard drive in the rack on bootup, and here what
I see:

[root at f7r2amd64 ~]$ atacontrol list
ATA channel 0:
    Master:      no device present
    Slave:       no device present
ATA channel 1:
    Master:      no device present
    Slave:       no device present
ATA channel 2:
    Master:      no device present
    Slave:       no device present
ATA channel 3:
    Master:      no device present
    Slave:       no device present
ATA channel 4:
    Master:  ad8 <WDC WD800JD-00LSA0/06.01D06> Serial ATA II
    Slave:       no device present
ATA channel 5:
    Master: ad10 <ST3250620NS/3.AEG> Serial ATA II
    Slave:       no device present
ATA channel 6:
    Master: ad12 <ST380815AS/3.AAA> Serial ATA v1.0
    Slave:       no device present
ATA channel 7:
    Master:      no device present
    Slave:       no device present
ATA channel 8:
    Master:      no device present
    Slave:       no device present
ATA channel 9:
    Master:      no device present
    Slave:       no device present
[root at f7r2amd64 ~]$ vmstat -i
interrupt                          total       rate
irq19: atapci3                        76          2
irq22: atapci2                      1234         35
irq23: nfe0                          250          7
cpu0: timer                        68395       1954
cpu1: timer                        68177       1947
Total                             138132       3946

The system is on ad8, a data drive is connected as ad10, and the removable
drive is ad12.  ad8 is the nForce sata1 (why it isn't ad4 is ASUS's fault
for labelling their sata ports weird), ad10 is the nForce sata2, and ad12 is
the SiI 3114 sata1.  I haven't cropped anything from the vmstat output,
which is to say that yes, I've disabled USB, firewire, and the second
ethernet adapter so that I can isolate what's going on.

Now I disconnect ad12 (edited output for vmstat):
[root at f7r2amd64 ~]$ atacontrol detach ata6
[root at f7r2amd64 ~]$ vmstat -i
irq19: atapci3                        76          0

Next I actually eject the drive:
[root at f7r2amd64 ~]$ vmstat -i ; sleep 10 ; vmstat -i
irq19: atapci3                   7266334      24548
irq19: atapci3                   8912703      29126

So interrupts are coming at roughly 160,000 per second.  The storm continues
even after the drive is re-inserted.  It stops after reconnecting with
"atacontrol attach ata6".

Some variations on this theme:
1) If I do the above but do not re-insert the drive and try "atacontrol
attach ata6", I get a "ATA_IDENTIFY timeout", and then a kernel panic "Fatal
trap 12: page fault while in kernel mode".
2) If I try using the nForce sata3 ports, I always get some kind of kernel
panic on attaching a drive and "atacontrol attach ata4".  The NVRAID
settings in the BIOS don't matter.  I don't know if anything is AHCI here.
3) If I try instead with a Dell Optiplex 330, which uses an Intel ICH7 for
sata and which is configured as AHCI in the BIOS (and fbsd shows as AHCI in
dmesg), I get exactly the same as I did with the SiI 3114, except that the
"atacontrol attach" when no device is attached simply detects no devices (no
kernel panic), and actually stops the interrupt storm.  When it is storming,
the rate is more like 190,000 per second.
4) I've tried using a JMicron JMB363 sata port on an add-in PCIex1 card
(with most recent firmware), but the attached drive is never detected, on
either computer.  No interrupt storm occurs through any of "atacontrol
attach" or detach, or drive removal or insertion, probably because I can
never detect any drives.  I'm think this card worked in 7.0-BETA4, but I
didn't know to check the interrupt rate.

I say in the subject line that this occurs on 6.3-RELEASE, but I've actually
only tested numbers 1, 2, and 4, and even then only with amd64.  I am making
a guess that 6.3-RELEASE would behave as 7.0-RC2 in case 3.  Two weeks ago,
I also tested 7.0-RC1/i386 in cases 1, 2, and 4, and I saw pretty much the
same behavior, but I haven't tested 7.0-RC2/i386.

As you might have guessed, my goal was have a system where I could hot-swap
my back-up drives.  Current workaround possibilities is to always have some
drive in the rack and have it attached, since I really need to use the
Opteron system.  It's nice that Intel users don't even have to keep a drive
in the rack, though, since the empty ata channel is detected as having no
devices without a kernel panic.  Another option is to use USB or firewire,
but I planned on using several back-up drives and all the enclosures would
add up in cost.  With sata racks all you need is the bare drive and you get
speed, keep the server room tidy, and don't have to carry around extra power
supplies.

I just want to say that I'm very grateful to the FreeBSD developers that I'm
able to use the SiI 3114 controller in any way.  As recently as 7.0-BETA4, I
think, I wasn't able to use it at all because it would always panic on
attach, and I'm thankful that it started working with RC1.  Of course, the
JMB363 seemed to stop detecting devices over those same releases, but it may
have had the same problems anyway, so I'm not complaining.

Finally, I have read much of the discussion concerning the interrupt storms
on the ASUS M2N-E from Sept 2007, and I thought they were related until I
found the problem on the Intel system.


More information about the freebsd-stable mailing list