kernel: bge0: watchdog timeout -- resetting

Wed Aug 3 15:49:37 GMT 2005

I had similar problems with the rl realtek driver with 5.3 and 5.4

A number of things seemed to relieve the problem (disable acpi, free up 
interrupts / make sure the nic wasn't sharing one) but it inevitably 
came back.

The stalling seemed quite random, I could push tens of gigabytes through 
...idle the connection for a day or so then transfer 200k and it would 
fall over.

There did seem to be a small correlation to transfer speed. The higher 
the speed the more likely it was to fall over. Transferring a 100mb file 
at 10 megabytes a second was more likely to fail than transferring a 
100mb file at 100k a second.

I didn't do much more digging (didn't have time and I don't have the 
knowledge without guidance) instead I switched nic and am now using the 
fxp intel driver with no problems.

I had the problems on a single cpu celeron box using both GENERIC kernel 
and custom kernels (with unused device drivers stripped out to free irqs).

I came across a number of other people via google having the same 
problem. Predominantly with the rl driver but also with others. What 
seemed to work for some people failed for others.

Have you made any hardware changes?

Kev

Raphael H. Becker wrote:
> On Tue, Jul 05, 2005 at 10:02:33AM +0200, Uzi Klein wrote:
> 
>>Ever since i upgraded to 5.4-RELEASE-p3 i get these entries in 
>>/var/log/messages once in a while :
>>
>>kernel: bge0: watchdog timeout -- resetting
> 
> 
> Same here with one of our new Dell PE6650:
> 
> Aug  3 17:25:41 pinserv7 kernel: bge0: watchdog timeout -- resetting
> Aug  3 17:41:51 pinserv7 kernel: bge0: watchdog timeout -- resetting
> Aug  3 18:01:16 pinserv7 kernel: bge0: watchdog timeout -- resetting
> Aug  3 18:10:51 pinserv7 last message repeated 3 times
> 
> I've rsynced some tons of data via ssh to that box without any problems,
> or lack of bandwidth. The card resetted while access a phpinfo() page on
> the webserver: the page itself transferred, the referenced <img> (zend
> logo) didn't. Simpultanously my ssh-session stalled for about 90sec and
> a parallel ping stopped. 
> Repeatable!
>  
> Parallel? Race condition?
> 
> I rebooted the machine and everything seems fine now. I'll watch this.
> 
> FreeBSD 5.4-RELEASE-p6 i386
> 
> bge0 at pci8:1:0:  class=0x020000 card=0x01091028 chip=0x164414e4 rev=0x14 hdr=0x00
>     vendor   = 'Broadcom Corporation'
>     device   = 'BCM5700 NetXtreme Gigabit Ethernet Controller'
>     class    = network
>     subclass = ethernet
> bge1 at pci8:2:0:  class=0x020000 card=0x01091028 chip=0x164414e4 rev=0x14 hdr=0x00
>     vendor   = 'Broadcom Corporation'
>     device   = 'BCM5700 NetXtreme Gigabit Ethernet Controller'
>     class    = network
>     subclass = ethernet
> 
> The PE6650 has 4 Xeons with HTT -> 8 logical CPUs. Maybe some kind of
> locking / race conditions? The kernel is GENERIC plus SMP.
> 
> See http://rabe.uugrn.org/FreeBSD/bugs/5.x/pinserv7/dmesg.boot for details
> 
> Any ideas?
> Anyone with similar problems?
> 
> Regards
> Raphael Becker
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"