ICH7 SATA and em interrupt sharing

Patrick M. Hausen hausen at punkt.de
Mon Aug 21 19:52:09 UTC 2006


And yet more testing ...

I rebuilt my kernel without USB devices and made sure
atapci1 doesn't share an interrupt with anything:

pcib1: 16
pcib2: 20
em0: 16
em1: 17
fxp0: 16
atapci1: 19
atkbdc0: 1
atkbd0: 1
sio0: 4
sio1: 3
ppc0: 7

Side note: on this particular box I had to leave the USB devices
enabled in the BIOS setup, otherwise em0 would end up on the same
interrupt as atapci1 |-)

Then I ran make buildworld and in parallel started to transfer a large
file via FTP (done by fetching a sparse file of 10 GB) maxing out
or 100 Mbit/s LAN.

*boom* - or so I thought ;-) The ssh session was stuck, the system did
not respond to ICMP echo. OK, wait until tomorrow morning to reset it ...
... just gave it one more ping an hour later, and the machine was
alive again! It did not panic/reboot, the buildworld was running and
the file transfer was transferring a file.

In /var/log messages I found:

Aug 21 21:37:08 tomcat kernel: em0: Missing Tx completion interrupt!
Aug 21 21:39:55 tomcat kernel: em0: Missing Tx completion interrupt!
Aug 21 21:40:29 tomcat kernel: em0: Missing Tx completion interrupt!

Seems like for some reason the netwok card blocked for a couple
of minutes, then resumed.

This was all with debug.mpsafenet set to 1. Now I'm running the same
stress test with debug.mpsafenet set to 0 and I haven't seen any
problem/hang at all.

Wait a minute ... now as I'm typing this message, ssh to the
box hangs again. Damn.

I think I'll try the fxp interface for production use and disable the
onboard Gigabit NICs.

Now the ssh session is responding again while the file transfer reports
"Connection reset by peer".

Dmesg shows:

em0: Missing Tx completion interrupt!
em0: Missing Tx completion interrupt!
em0: Missing Tx completion interrupt!
em0: Missing Tx completion interrupt!
em0: Missing Tx completion interrupt!
em0: Missing Tx completion interrupt!

I'm still not able to really reproduce the SATA problem others are
reporting, besides forcing em0 to share its interrupt with the
SATA controller. This can easily be avoided - at least with our
hardware.


Regards,

Patrick M. Hausen
Leiter Netzwerke und Sicherheit
-- 
punkt.de GmbH         Internet - Dienstleistungen - Beratung
Vorholzstr. 25        Tel. 0721 9109 -0 Fax: -100
76137 Karlsruhe       http://punkt.de


More information about the freebsd-stable mailing list