Lost interrupts on SMP systems

Peter Trifonov pvtrifonov at mail.ru
Thu Dec 30 00:45:17 PST 2004


Hello,
I have posted a message about this problem to freebsd-stable and 
freebsd-hardware. After more detailed investigation I believe that the 
problem is related to SMP implementation.
I have a dual-procerror PentiumPro box with 3 NICs
xl0 is at irq 10,
fxp0, fxp1  are at irq 11.
If heavy traffic occurs on both fxp's the system says 
fxp0: device timeout
fxp1: device timeout
and they stop working. fxp's can be revived by doing 
ifconfig fxp{0,1} down;ifconfig fxp{0,1} up
Sometimes this causes the system to say
fxp0: SCB timeout: 0x70 0x0 0x50 0x0 
fxp1: SCB timeout: 0x20 0x0 0x50 0x0
However, the network goes back to life.

After replacing both fxp's with 3COM NICs "fxp* device timeout" messages have 
changed to "xl{1,2}: watchdog timeout". xl0 still works fine. Again, the 
network can be fixed by doing ifconfig down&up.
Therefore, the problem is not likely to be related to fxp or xl drivers.

Now I have such configuration:
xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0xf400-0xf47f mem 
0xffbec000-0xffbec07f irq 10 at device 12.0 on pci0
xl1: <3Com 3c905B-TX Fast Etherlink XL> port 0xf000-0xf07f mem 
0xffbdc000-0xffbdc07f irq 11 at device 13.0 on pci0
xl2: <3Com 3c905C-TX Fast Etherlink XL> port 0xec00-0xec7f mem 
0xffbcc000-0xffbcc07f irq 11 at device 14.0 on pci0


Currently I have to run a cron script checking every 5 minutes the network and 
doing ifconfig down&up.

I believe that IRQ sharing on SMP causes some interrupts to be lost. Please 
let me know  if any patches/workarounds are known.

-- 
With best regards,
P. Trifonov



More information about the freebsd-smp mailing list