em0 watchdog timeouts
Daniel Bond
db at danielbond.org
Mon Oct 5 14:19:10 UTC 2009
Hi,
I've been struggling with watchdog timeouts in 7.1/7.2-RELEASE for the
past 6months too. It looks related.
I've tried to replace the hardware 3 times (2 different IBM x3755
chassis, one IBM x3650 chassis).
I tried first with onboard broadcom NICs (bce-based) PCIx-based, until
I had issues with "watchdog timeout".
I tried replacing it with a 4-port pci-x Intel NIC, which gave me same
problems. I was told that the 4-port intel NICs had an onboard bus-
controller, that
could cause trouble, so I replaced this with a 2-port PCI-e intel,
which I was told by a Sepherosa Ziehau was the best performing gig-e
NIC (rx/tx).
Still getting watchdog timeouts, I tried upgrading all sort of sysctls
I found in mailing-list threads (disable msi/msix interrupts, adjust
rx/tx processing, etc, etc).
I tried upgrading BIOS, firmware on all kinds of stuff (disks, BMC,
etc, etc) to newest version. I also tried using a different qlogic
isp(4) FC-controller (PCI-e).
No matter what I tried, I could not diagnose this problem, or at least
fix it. Also it happened rarely enough, to not be easy to debugging. I
would get a series of "watchdog timeout -- resetting", until the NIC
would go completly offline - at the point I'd reboot it from console.
This happened about once every 1-10 days, usually about 11-13:00. This
machine has now been replaced with Linux, unfortunately, just to avoid
more customer complaints and downtime. The IBM x3755 with FreeBSD7.2
which was replaced with Linux, is still online, and
can be put at disposal for any developers who would like to debug this
further.
Like Stefan Krueger mentioned, this machine is also running as NFS
server, with a mix of BSD and Linux clients, and it's getting hit
pretty hard by clients.
Hope we can iron this bug out, in the future.
Best regards,
Daniel Bond.
On Oct 2, 2009, at 10:36 PM, Rudy wrote:
>
> Ah, I'll stop messing with them.
>
>
> I just set them all to 0 to see if that will help and noticed the card
> was leaving tx_int_delay=1.
>
> # sysctl dev.em.4.debug=1
> Oct 2 13:26:07 mango kernel: em4: tx_int_delay = 1,
> tx_abs_int_delay = 0
> Oct 2 13:26:07 mango kernel: em4: rx_int_delay = 0,
> rx_abs_int_delay = 0
>
> # sysctl dev.em.4
> dev.em.4.%desc: Intel(R) PRO/1000 Network Connection 6.9.12
> dev.em.4.rx_int_delay: 0
> dev.em.4.tx_int_delay: 0
> dev.em.4.rx_abs_int_delay: 0
> dev.em.4.tx_abs_int_delay: 0
>
> Splitting traffic to different ports has brought down the watchdog
> events to once a day. ... essentially, I have a quad 30Mbps (not quad
> 1Gbps) card. heheh.
> Would turning off net.inet.ip.fastforwarding or any other setting
> help?
>
> Today, I set net.inet.ip.fw.enable=0 and I'll see if that helps. I
> have
> a feeling that isn't related to the NIC at all, but I'm not sure what
> else to try.
>
> Rudy
>
>
>
> Jack Vogel wrote:
>> Watchdog resets the adapter. Messing with these values is of
>> dubious value
>> anyway.
>>
>> Jack
>>
>>
>> On Fri, Oct 2, 2009 at 11:36 AM, Rudy <crapsh at monkeybrains.net>
>> wrote:
>>
>>
>>> I noticed something interesting.
>>>
>>> I set the rc_int_delay to 0:
>>> sysctl dev.em.5.rx_int_delay=0
>>>
>>> Chcking via sysctl dev.em.5.debug=1 shows ex_int_delay is indeed 0:
>>> Oct 1 17:32:41 mango kernel: em5: rx_int_delay = 0,
>>> rx_abs_int_delay = 66
>>>
>>> After a watchdog event, sysctl dev.em.5.debug=1 shows ex_int_delay
>>> is
>>> now 32:
>>> Oct 2 11:29:49 mango kernel: em5: rx_int_delay = 32,
>>> rx_abs_int_delay =
>>> 66
>>>
>>> However, running sysctl dev.em.5 shows it as 0:
>>> dev.em.5.rx_int_delay: 0
>>> dev.em.5.tx_int_delay: 66
>>>
>>> Seems like the adapter and the kernel don't agree on the
>>> rx_int_delay
>>> value.
>>>
>>> Rudy
>>>
>>>
>>
>>
>
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org
> "
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 203 bytes
Desc: This is a digitally signed message part
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20091005/d64f966c/PGP.pgp
More information about the freebsd-stable
mailing list