watchdog timeout

James Wyatt jwyatt at rwsystems.net
Mon Apr 13 12:55:44 UTC 2009


On Fri, 10 Apr 2009, Pyun YongHyeon wrote:
> On Wed, Apr 08, 2009 at 10:41:44AM +0200, xer wrote:
>> Hello
>> I have some problems with 3Com nics, after a upgrade from 5.5-STABLE to
>> 6.4-STABLE.
>>
>> This machine has two 3com nics (one is LAN other is WAN) and i see too much
>> "watchdog timeout" on both cards.
>> This on/off up/down on cards, affect the interrupt to clients that are
>> downloading from apache web server, especially on large files.
>>
>> --------------------------------------------
>> xer:/root# dmesg
>> xl1: watchdog timeout
>> xl1: link state changed to DOWN
>> xl1: link state changed to UP
>> xl1: watchdog timeout
>> xl1: link state changed to DOWN
>> xl1: link state changed to UP
>> xl1: watchdog timeout
>> xl1: link state changed to DOWN
>> xl1: link state changed to UP
>> ---------------------------------------------
 	[ . . . ]
>> As you can see, the cards are 3c905C-TX model.
>> Someone told me to change drivers, but i cannot understand this advice.
>> I got same errors with same cards but with another mainboard, same problem,
>> watchdog appears after an upgrade from 5.4-STABLE to 6.4-STABLE.
>>
>> I don't think that to change nic's pci slots, will solve the problem, i
>> think that maybe change the nics would resolve the matter, but i cannot
>> access to both FreeBSD phisically, cause the boxes are too far from me
>> (about 3500 km).
>>
>> I'm asking you some advices, and i can i fix this problem.
>> p.s. with both 5.4 or 5.5 old kernel, the nics was fine.
>
> I vaguely remember there were a couple of reports on xl(4) watchdog
> timeouts. I'm not sure this came from missing Tx interrupts but
> would you try attached patch?
> Note, it was generated against HEAD and you should experiment the
> attached patch on local box prior to applying to your production
> server.

Perhaps the case can convey the amount of hair I lost over this: HAVE YOU 
CHECKED YOUR BIOS AND ONBOARD IO SETTINGS?  I have been swapping boards 
for days for another firewall project and finally figured it out last 
night.  I would get messages like these, depending on the 3rd card used:

 	xl0: Watchdog Timeout
 	pcn0: Watchdog timeout

Finally the Sierra Nevada Porter kicked-in and an old idea came back to 
me: I was running out of interrupts!  1/2 (^_^)  The hint was from a 
combination of having the earlier advice of "set the 'PNP OS' to false" 
fail and a Tom's Hardware mobo review complaint about 5-slot PCI mobos 
having IRQ sharing issues.  Thanks to you both wherever you are!

Finally I went in and disabled all the onboard IO I wasn't using to free 
up IRQs.  Disable the onboard serial if you aren't using them.  If you 
are, then an 8-port board or USB serial, can save COM1 and COM2 IRQs.  No 
video IRQ needed for simple console work.  No parallel needed, so saved 
that one.  No floppy nowadays, either.  Lots of options for you!

Thinking I hadn't had IRQ issues in 15 years or so, it sure reminded me 
that we still have the legacy x86 IRQ limitations.  This has pushed me to 
thinking more about shifting to trunked VLANs to save ENet ports and make 
recovery easier.

FWIW: I tend to use different cards so the network ports don't "float" to 
other numbers if one dies.  This made the problem worse because the 
drivers assign IRQs when they load, so it looked like xl0 cards were the 
issue when 'de*' and 'dc*' worked.  Changing slots changed things too! 
This explains some of what you're experience shows.

There is no way I can give back to this list what it's given me in 
technical support, but if this makes things work for you, then I've given 
*something* back and it really is a Good Friday - Jy@


More information about the freebsd-stable mailing list