em0 watchdog timeouts

Daniel Bond db at danielbond.org
Mon Oct 5 14:19:10 UTC 2009


Hi,

I've been struggling with watchdog timeouts in 7.1/7.2-RELEASE for the  
past 6months too. It looks related.

I've tried to replace the hardware 3 times (2 different IBM x3755  
chassis, one IBM x3650 chassis).
I tried first with onboard broadcom NICs (bce-based) PCIx-based, until  
I had issues with "watchdog timeout".

I tried replacing it with a 4-port pci-x Intel NIC, which gave me same  
problems. I was told that the 4-port intel NICs had an onboard bus- 
controller, that
could cause trouble, so I replaced this with a 2-port PCI-e intel,  
which I was told by a Sepherosa Ziehau was the best performing gig-e  
NIC (rx/tx).

Still getting watchdog timeouts, I tried upgrading all sort of sysctls  
I found in mailing-list threads (disable msi/msix interrupts, adjust  
rx/tx processing, etc, etc).
I tried upgrading BIOS, firmware on all kinds of stuff (disks, BMC,  
etc, etc) to newest version. I also tried using a different qlogic  
isp(4) FC-controller (PCI-e).

No matter what I tried, I could not diagnose this problem, or at least  
fix it. Also it happened rarely enough, to not be easy to debugging. I  
would get a series of "watchdog timeout -- resetting", until the NIC  
would go completly offline - at the point I'd reboot it from console.

This happened about once every 1-10 days, usually about 11-13:00. This  
machine has now been replaced with Linux, unfortunately, just to avoid  
more customer complaints and downtime. The IBM x3755 with FreeBSD7.2  
which was replaced with Linux, is still online, and
can be put at disposal for any developers who would like to debug this  
further.

Like Stefan Krueger mentioned, this machine is also running as NFS  
server, with a mix of BSD and Linux clients, and it's getting hit  
pretty hard by clients.


Hope we can iron this bug out, in the future.


Best regards,


Daniel Bond.



On Oct 2, 2009, at 10:36 PM, Rudy wrote:

>
> Ah, I'll stop messing with them.
>
>
> I just set them all to 0 to see if that will help and noticed the card
> was leaving tx_int_delay=1.
>
> # sysctl dev.em.4.debug=1
> Oct  2 13:26:07 mango kernel: em4: tx_int_delay = 1,  
> tx_abs_int_delay = 0
> Oct  2 13:26:07 mango kernel: em4: rx_int_delay = 0,  
> rx_abs_int_delay = 0
>
> # sysctl dev.em.4
> dev.em.4.%desc: Intel(R) PRO/1000 Network Connection 6.9.12
> dev.em.4.rx_int_delay: 0
> dev.em.4.tx_int_delay: 0
> dev.em.4.rx_abs_int_delay: 0
> dev.em.4.tx_abs_int_delay: 0
>
> Splitting traffic to different ports has brought down the watchdog
> events to once a day.  ... essentially, I have a quad 30Mbps (not quad
> 1Gbps) card.  heheh.
> Would turning off net.inet.ip.fastforwarding or any other setting  
> help?
>
> Today, I set net.inet.ip.fw.enable=0 and I'll see if that helps.  I  
> have
> a feeling that isn't related to the NIC at all, but I'm not sure what
> else to try.
>
> Rudy
>
>
>
> Jack Vogel wrote:
>> Watchdog resets the adapter. Messing with these values is of  
>> dubious value
>> anyway.
>>
>> Jack
>>
>>
>> On Fri, Oct 2, 2009 at 11:36 AM, Rudy <crapsh at monkeybrains.net>  
>> wrote:
>>
>>
>>> I noticed something interesting.
>>>
>>> I set the rc_int_delay to 0:
>>> sysctl dev.em.5.rx_int_delay=0
>>>
>>> Chcking via sysctl dev.em.5.debug=1 shows ex_int_delay is indeed 0:
>>> Oct  1 17:32:41 mango kernel: em5: rx_int_delay = 0,  
>>> rx_abs_int_delay = 66
>>>
>>> After a watchdog event, sysctl dev.em.5.debug=1 shows ex_int_delay  
>>> is
>>> now 32:
>>> Oct  2 11:29:49 mango kernel: em5: rx_int_delay = 32,  
>>> rx_abs_int_delay =
>>> 66
>>>
>>> However, running sysctl dev.em.5 shows it as 0:
>>> dev.em.5.rx_int_delay: 0
>>> dev.em.5.tx_int_delay: 66
>>>
>>> Seems like the adapter and the kernel don't agree on the  
>>> rx_int_delay
>>> value.
>>>
>>> Rudy
>>>
>>>
>>
>>
>
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org 
> "

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 203 bytes
Desc: This is a digitally signed message part
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20091005/d64f966c/PGP.pgp


More information about the freebsd-stable mailing list