Lost interrupts on SMP systems
Peter Trifonov
pvtrifonov at mail.ru
Thu Dec 30 00:45:17 PST 2004
Hello,
I have posted a message about this problem to freebsd-stable and
freebsd-hardware. After more detailed investigation I believe that the
problem is related to SMP implementation.
I have a dual-procerror PentiumPro box with 3 NICs
xl0 is at irq 10,
fxp0, fxp1 are at irq 11.
If heavy traffic occurs on both fxp's the system says
fxp0: device timeout
fxp1: device timeout
and they stop working. fxp's can be revived by doing
ifconfig fxp{0,1} down;ifconfig fxp{0,1} up
Sometimes this causes the system to say
fxp0: SCB timeout: 0x70 0x0 0x50 0x0
fxp1: SCB timeout: 0x20 0x0 0x50 0x0
However, the network goes back to life.
After replacing both fxp's with 3COM NICs "fxp* device timeout" messages have
changed to "xl{1,2}: watchdog timeout". xl0 still works fine. Again, the
network can be fixed by doing ifconfig down&up.
Therefore, the problem is not likely to be related to fxp or xl drivers.
Now I have such configuration:
xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0xf400-0xf47f mem
0xffbec000-0xffbec07f irq 10 at device 12.0 on pci0
xl1: <3Com 3c905B-TX Fast Etherlink XL> port 0xf000-0xf07f mem
0xffbdc000-0xffbdc07f irq 11 at device 13.0 on pci0
xl2: <3Com 3c905C-TX Fast Etherlink XL> port 0xec00-0xec7f mem
0xffbcc000-0xffbcc07f irq 11 at device 14.0 on pci0
Currently I have to run a cron script checking every 5 minutes the network and
doing ifconfig down&up.
I believe that IRQ sharing on SMP causes some interrupts to be lost. Please
let me know if any patches/workarounds are known.
--
With best regards,
P. Trifonov
More information about the freebsd-smp
mailing list