Watchdog timeouts and dead network on bge - 6.1-RC1

Robert Watson rwatson at FreeBSD.org
Sun Apr 23 12:41:53 UTC 2006


On Sun, 23 Apr 2006, Lars Erik Gullerud wrote:

> We recently upgraded one of our 4.11 servers to 6.1-RC1. The server is a 
> Dell PE2650, dual Xeons, and has two onboard Broadcom BCM5701 cards, using 
> the bge driver.
>
> Some older threads on -net and -current led me to believe that most issues 
> with bge driver in FreeBSD >4 had been sorted. However, after our upgrade, 
> we are seing errors like this:

There's a Dell 2650 in the FreeBSD netperf cluster.  When working with 5.x on 
the box quite a long time ago, I saw similar problems, in which the network 
interface stalled and required kicking to reset.  Unfortunately, this is not 
an issue I have time to work on currently, but if it would help a FreeBSD 
developer track down and debug this problem, I can provide remote access to a 
box that has had the problem in the past, along with serial console, remote 
power, and network booting.  I'll run some tests on it today and see if that 
box still has the same problem or not.  I've never been entirely convinced it 
was actually a bge problem as opposed to an interrupt delivery problem, 
however.  Dmesg fragment below.

Robert N M Watson

Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
         The Regents of the University of California. All rights reserved.
FreeBSD 6.0-CURRENT #1: Sat Jan 29 21:32:42 EST 2005
     rwatson at zoo.freebsd.org:/usr/obj/zoo/rwatson/netperf/src/sys/GENERIC
WARNING: WITNESS option enabled, expect reduced performance.
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) XEON(TM) CPU 2.20GHz (2192.90-MHz 686-class CPU)
   Origin = "GenuineIntel"  Id = 0xf24  Stepping = 4

Features=0x3febfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM>
real memory  = 2147352576 (2047 MB)
avail memory = 2096799744 (1999 MB)
ACPI APIC Table: <DELL   PE2650  >
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
  cpu0 (BSP): APIC ID:  0
  cpu1 (AP): APIC ID:  6
ioapic0: Changing APIC ID to 8
ioapic1: Changing APIC ID to 9
ioapic2: Changing APIC ID to 10
MADT: Forcing active-low polarity and level trigger for SCI
ioapic0 <Version 1.1> irqs 0-15 on motherboard
ioapic1 <Version 1.1> irqs 16-31 on motherboard
ioapic2 <Version 1.1> irqs 32-47 on motherboard
...
ACPI APIC Table: <DELL   PE2650  >
acpi0: <DELL PE2650> on motherboard
aac0: <Dell PERC 3/Di> mem 0xf0000000-0xf7ffffff irq 30 at device 8.1 on pci4
...
bge0: <Broadcom BCM5701 Gigabit Ethernet, ASIC rev. 0x105> mem 
0xfcd10000-0xfcd1ffff irq 28 at device 6.0 on pci3
miibus0: <MII bus> on bge0
bge0: Ethernet address: 00:06:5b:8e:b9:8d
bge1: <Broadcom BCM5701 Gigabit Ethernet, ASIC rev. 0x105> mem 
0xfcd00000-0xfcd0ffff irq 29 at device 8.0 on pci3
miibus1: <MII bus> on bge1
bge1: Ethernet address: 00:06:5b:8e:b9:8e


>
> Apr 22 18:44:01 nebula kernel: bge0: watchdog timeout -- resetting
> Apr 22 18:44:01 nebula kernel: bge0: link state changed to DOWN
> Apr 22 18:44:03 nebula kernel: bge0: link state changed to UP
>
> ...and more importantly - when this happens, the network connection does NOT 
> in fact come back up. Logging into the box locally (or via a different 
> network interface) and manually issuing "ifconfig bge0 down ; ifconfig bge0 
> up" DOES get the interface going again, however.
>
> We have only seen this on very high network loads - the particular message 
> included above occured while transferring some 120GB of data from a 4.11 
> NFS-server to this 6.1-RC1 box.
>
> Is this a known issue in bge? If so, is anyone working on it? Can we provide 
> some useful information to whoever this might be?
>
> We have never had any issues with bge in 4.x, but we really need to get this 
> server up to 5.x/6.x at this point in time, any other suggestions on knobs or 
> workarounds that can give us bge stability?
>
> Thanks in advance,
>
> /leg
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>


More information about the freebsd-net mailing list