em0 watchdog timeouts

Jack Vogel jfvogel at gmail.com
Mon Oct 5 16:57:50 UTC 2009


This posting just muddies the issue, first you talk about having a problem
that
involves Broadcom, ok, so post about that on something other than em :)

Then you make some references to hardware that you "might have bought"
but didn't, I'm not about debugging 'possible worlds problems' though so
can't help you there either :)

Finally you never say what the actual hardware is, other than a person who
I do not know told you it was the best performer... so, what exactly is it?

You have a problem once every 10 days,  and at a specific time no less,
this almost always means something in your environment, a cron job run
amok, a piece of hardware that resets, I dunno, but the last thing I would
suspect given this description is the driver.

You need a good sysadmin for this debugging I would venture, not a driver
developer.

Jack


On Mon, Oct 5, 2009 at 7:19 AM, Daniel Bond <db at danielbond.org> wrote:

> Hi,
>
> I've been struggling with watchdog timeouts in 7.1/7.2-RELEASE for the past
> 6months too. It looks related.
>
> I've tried to replace the hardware 3 times (2 different IBM x3755 chassis,
> one IBM x3650 chassis).
> I tried first with onboard broadcom NICs (bce-based) PCIx-based, until I
> had issues with "watchdog timeout".
>
> I tried replacing it with a 4-port pci-x Intel NIC, which gave me same
> problems. I was told that the 4-port intel NICs had an onboard
> bus-controller, that
> could cause trouble, so I replaced this with a 2-port PCI-e intel, which I
> was told by a Sepherosa Ziehau was the best performing gig-e NIC (rx/tx).
>
> Still getting watchdog timeouts, I tried upgrading all sort of sysctls I
> found in mailing-list threads (disable msi/msix interrupts, adjust rx/tx
> processing, etc, etc).
> I tried upgrading BIOS, firmware on all kinds of stuff (disks, BMC, etc,
> etc) to newest version. I also tried using a different qlogic isp(4)
> FC-controller (PCI-e).
>
> No matter what I tried, I could not diagnose this problem, or at least fix
> it. Also it happened rarely enough, to not be easy to debugging. I would get
> a series of "watchdog timeout -- resetting", until the NIC would go
> completly offline - at the point I'd reboot it from console.
>
> This happened about once every 1-10 days, usually about 11-13:00. This
> machine has now been replaced with Linux, unfortunately, just to avoid more
> customer complaints and downtime. The IBM x3755 with FreeBSD7.2 which was
> replaced with Linux, is still online, and
> can be put at disposal for any developers who would like to debug this
> further.
>
> Like Stefan Krueger mentioned, this machine is also running as NFS server,
> with a mix of BSD and Linux clients, and it's getting hit pretty hard by
> clients.
>
>
> Hope we can iron this bug out, in the future.
>
>
> Best regards,
>
>
> Daniel Bond.
>
>
>
>
> On Oct 2, 2009, at 10:36 PM, Rudy wrote:
>
>
>> Ah, I'll stop messing with them.
>>
>>
>> I just set them all to 0 to see if that will help and noticed the card
>> was leaving tx_int_delay=1.
>>
>> # sysctl dev.em.4.debug=1
>> Oct  2 13:26:07 mango kernel: em4: tx_int_delay = 1, tx_abs_int_delay = 0
>> Oct  2 13:26:07 mango kernel: em4: rx_int_delay = 0, rx_abs_int_delay = 0
>>
>> # sysctl dev.em.4
>> dev.em.4.%desc: Intel(R) PRO/1000 Network Connection 6.9.12
>> dev.em.4.rx_int_delay: 0
>> dev.em.4.tx_int_delay: 0
>> dev.em.4.rx_abs_int_delay: 0
>> dev.em.4.tx_abs_int_delay: 0
>>
>> Splitting traffic to different ports has brought down the watchdog
>> events to once a day.  ... essentially, I have a quad 30Mbps (not quad
>> 1Gbps) card.  heheh.
>> Would turning off net.inet.ip.fastforwarding or any other setting help?
>>
>> Today, I set net.inet.ip.fw.enable=0 and I'll see if that helps.  I have
>> a feeling that isn't related to the NIC at all, but I'm not sure what
>> else to try.
>>
>> Rudy
>>
>>
>>
>> Jack Vogel wrote:
>>
>>> Watchdog resets the adapter. Messing with these values is of dubious
>>> value
>>> anyway.
>>>
>>> Jack
>>>
>>>
>>> On Fri, Oct 2, 2009 at 11:36 AM, Rudy <crapsh at monkeybrains.net> wrote:
>>>
>>>
>>>  I noticed something interesting.
>>>>
>>>> I set the rc_int_delay to 0:
>>>> sysctl dev.em.5.rx_int_delay=0
>>>>
>>>> Chcking via sysctl dev.em.5.debug=1 shows ex_int_delay is indeed 0:
>>>> Oct  1 17:32:41 mango kernel: em5: rx_int_delay = 0, rx_abs_int_delay =
>>>> 66
>>>>
>>>> After a watchdog event, sysctl dev.em.5.debug=1 shows ex_int_delay is
>>>> now 32:
>>>> Oct  2 11:29:49 mango kernel: em5: rx_int_delay = 32, rx_abs_int_delay =
>>>> 66
>>>>
>>>> However, running sysctl dev.em.5 shows it as 0:
>>>> dev.em.5.rx_int_delay: 0
>>>> dev.em.5.tx_int_delay: 66
>>>>
>>>> Seems like the adapter and the kernel don't agree on the rx_int_delay
>>>> value.
>>>>
>>>> Rudy
>>>>
>>>>
>>>>
>>>
>>>
>> _______________________________________________
>> freebsd-stable at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
>>
>
>


More information about the freebsd-stable mailing list