kern/123172: Watchdog timeout problems with if_bce

Josh josh at endries.org
Mon Apr 28 14:20:03 UTC 2008


>Number:         123172
>Category:       kern
>Synopsis:       Watchdog timeout problems with if_bce
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Apr 28 14:20:01 UTC 2008
>Closed-Date:
>Last-Modified:
>Originator:     Josh
>Release:        7.0-RELEASE
>Organization:
>Environment:
FreeBSD 7.0-RELEASE/amd64, custom kernel (SMP with SCHED_ULE and MAC)
>Description:
The machine doesn't lock up, but becomes unusable. The network is completely unusable, and anything involving networking "parts" is also. E.g., if I run ifconfig it locks up my shell. When that happened I logged into another tty and ran "sysctl -a | grep watchdog" and it locked the whole machine up. I couldn't go back to the initial tty, ctrl-alt-del, or anything; had to hard reset. These are the messages in syslog:

Apr 28 00:00:04 hathor kernel: bce0: /jails/src/usr/src/sys/dev/bce/if_bce.c(5244): Watchdog timeout occurred, resetting!
Apr 28 00:00:04 hathor kernel: bce0: link state changed to DOWN
Apr 28 00:00:07 hathor kernel: bce0: link state changed to UP
Apr 28 00:00:14 hathor kernel: bce1: /jails/src/usr/src/sys/dev/bce/if_bce.c(5244): Watchdog timeout occurred, resetting!
Apr 28 00:00:14 hathor kernel: bce1: link state changed to DOWN
Apr 28 00:00:16 hathor kernel: bce1: link state changed to UP
Apr 28 00:00:18 hathor kernel: bce0: /jails/src/usr/src/sys/dev/bce/if_bce.c(5244): Watchdog timeout occurred, resetting!
Apr 28 00:00:18 hathor kernel: bce0: link state changed to DOWN
Apr 28 00:00:21 hathor kernel: bce0: link state changed to UP
Apr 28 00:00:23 hathor kernel: bce1: /jails/src/usr/src/sys/dev/bce/if_bce.c(5244): Watchdog timeout occurred, resetting!
Apr 28 00:00:23 hathor kernel: bce1: link state changed to DOWN
Apr 28 00:00:25 hathor kernel: bce1: link state changed to UP
Apr 28 00:00:28 hathor kernel: bce0: /jails/src/usr/src/sys/dev/bce/if_bce.c(5244): Watchdog timeout occurred, resetting!
..

This just repeats. It seems to happen when there is a significant amount of traffic, possibly based on or more affected by UDP traffic. That machine currently runs a MySQL slave jail and a BIND jail, and it worked fine until I started using BIND, but the slave isn't very bandwidth intensive. It was fine for a few days, then died, and now it seems to die much more often (possibly because BIND is being used). I can't get into it right now to get a uname (it's remote, and broke again a few minutes ago), but I did get a dmesg (below) before it broke.

I currently have it set up to use LACP via lagg and vlan devices on top of that. I'm doing some funky things with pf (route-to/reply-to/NAT for jails). I'm going to change it to be more basic: one NIC external and one internal, real jail IPs, to see if that helps any. Unfortunately this is pretty much a showstopper. :( If there are any tests/info/shell/contact info that would help someone work on this please let me know.

---

Copyright (c) 1992-2008 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.0-RELEASE #0: Tue Mar 24 13:36:33 EDT 2009
    root at hathor.production.pyramid:/jails/src/usr/obj/jails/src/usr/src/sys/ULEM                                                                              AC
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(R) CPU           E5420  @ 2.50GHz (2500.11-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x10676  Stepping = 6
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,C                                                                              MOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0xce3bd<SSE3,RSVD2,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,<                                                                              b19>>
  AMD Features=0x20100800<SYSCALL,NX,LM>
  AMD Features2=0x1<LAHF>
  Cores per package: 4
usable memory = 8575201280 (8177 MB)
avail memory  = 8287870976 (7903 MB)
ACPI APIC Table: <HP     ProLiant>
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
 cpu4 (AP): APIC ID:  4
 cpu5 (AP): APIC ID:  5
 cpu6 (AP): APIC ID:  6
 cpu7 (AP): APIC ID:  7
ioapic0 <Version 2.0> irqs 0-23 on motherboard
ioapic1 <Version 2.0> irqs 24-47 on motherboard
kbd1 at kbdmux0
ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
hptrr: HPT RocketRAID controller driver v1.1 (Mar 24 2009 13:36:25)
acpi0: <HP ProLiant> on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x908-0x90b on acpi0
acpi_hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 900
cpu0: <ACPI CPU> on acpi0
est0: <Enhanced SpeedStep Frequency Control> on cpu0
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 4720472006004720
device_attach: est0 attach returned 6
p4tcc0: <CPU Frequency Thermal Control> on cpu0
cpu1: <ACPI CPU> on acpi0
est1: <Enhanced SpeedStep Frequency Control> on cpu1
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 4720472006004720
device_attach: est1 attach returned 6
p4tcc1: <CPU Frequency Thermal Control> on cpu1
cpu2: <ACPI CPU> on acpi0
est2: <Enhanced SpeedStep Frequency Control> on cpu2
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 4720472006004720
device_attach: est2 attach returned 6
p4tcc2: <CPU Frequency Thermal Control> on cpu2
cpu3: <ACPI CPU> on acpi0
est3: <Enhanced SpeedStep Frequency Control> on cpu3
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 4720472006004720
device_attach: est3 attach returned 6
p4tcc3: <CPU Frequency Thermal Control> on cpu3
cpu4: <ACPI CPU> on acpi0
est4: <Enhanced SpeedStep Frequency Control> on cpu4
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 4720472006004720
device_attach: est4 attach returned 6
p4tcc4: <CPU Frequency Thermal Control> on cpu4
cpu5: <ACPI CPU> on acpi0
est5: <Enhanced SpeedStep Frequency Control> on cpu5
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 4720472006004720
device_attach: est5 attach returned 6
p4tcc5: <CPU Frequency Thermal Control> on cpu5
cpu6: <ACPI CPU> on acpi0
est6: <Enhanced SpeedStep Frequency Control> on cpu6
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 4720472006004720
device_attach: est6 attach returned 6
p4tcc6: <CPU Frequency Thermal Control> on cpu6
cpu7: <ACPI CPU> on acpi0
est7: <Enhanced SpeedStep Frequency Control> on cpu7
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 4720472006004720
device_attach: est7 attach returned 6
p4tcc7: <CPU Frequency Thermal Control> on cpu7
pcib0: <ACPI Host-PCI bridge> on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> at device 2.0 on pci0
pci9: <ACPI PCI bus> on pcib1
pcib2: <ACPI PCI-PCI bridge> at device 0.0 on pci9
pci10: <ACPI PCI bus> on pcib2
pcib3: <ACPI PCI-PCI bridge> at device 0.0 on pci10
pci11: <ACPI PCI bus> on pcib3
pcib4: <PCI-PCI bridge> at device 1.0 on pci10
pci14: <PCI bus> on pcib4
pcib5: <PCI-PCI bridge> at device 2.0 on pci10
pci15: <PCI bus> on pcib5
pcib6: <ACPI PCI-PCI bridge> at device 0.3 on pci9
pci16: <ACPI PCI bus> on pcib6
pcib7: <ACPI PCI-PCI bridge> at device 3.0 on pci0
pci6: <ACPI PCI bus> on pcib7
ciss0: <HP Smart Array P400i> port 0x4000-0x40ff mem 0xfde00000-0xfdefffff,0xfdd                                                                              f0000-0xfddf0fff irq 16 at device 0.0 on pci6
ciss0: [ITHREAD]
pcib8: <ACPI PCI-PCI bridge> at device 4.0 on pci0
pci19: <ACPI PCI bus> on pcib8
pcib9: <PCI-PCI bridge> at device 5.0 on pci0
pci22: <PCI bus> on pcib9
pcib10: <ACPI PCI-PCI bridge> at device 6.0 on pci0
pci2: <ACPI PCI bus> on pcib10
pcib11: <ACPI PCI-PCI bridge> at device 0.0 on pci2
pci3: <ACPI PCI bus> on pcib11
bce0: <Broadcom NetXtreme II BCM5708 1000Base-T (B2)> mem 0xf8000000-0xf9ffffff                                                                               irq 18 at device 0.0 on pci3
miibus0: <MII bus> on bce0
brgphy0: <BCM5708C 10/100/1000baseTX PHY> PHY 1 on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-F                                                                              DX, auto
bce0: Ethernet address: 00:1f:29:06:d9:e2
bce0: [ITHREAD]
bce0: ASIC (0x57081020); Rev (B2); Bus (PCI-X, 64-bit, 133MHz); F/W (0x01090605)                                                                              ; Flags( MFW MSI )
pcib12: <ACPI PCI-PCI bridge> at device 7.0 on pci0
pci4: <ACPI PCI bus> on pcib12
pcib13: <ACPI PCI-PCI bridge> at device 0.0 on pci4
pci5: <ACPI PCI bus> on pcib13
bce1: <Broadcom NetXtreme II BCM5708 1000Base-T (B2)> mem 0xfa000000-0xfbffffff                                                                               irq 19 at device 0.0 on pci5
miibus1: <MII bus> on bce1
brgphy1: <BCM5708C 10/100/1000baseTX PHY> PHY 1 on miibus1
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-F                                                                              DX, auto
bce1: Ethernet address: 00:1f:29:06:d9:e0
bce1: [ITHREAD]
bce1: ASIC (0x57081020); Rev (B2); Bus (PCI-X, 64-bit, 133MHz); F/W (0x01090605)                                                                              ; Flags( MFW MSI )
uhci0: <Intel 631XESB/632XESB/3100 USB controller USB-1> port 0x1000-0x101f irq                                                                               16 at device 29.0 on pci0
uhci0: [GIANT-LOCKED]
uhci0: [ITHREAD]
usb0: <Intel 631XESB/632XESB/3100 USB controller USB-1> on uhci0
usb0: USB revision 1.0
uhub0: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0
uhub0: 2 ports with 2 removable, self powered
uhci1: <Intel 631XESB/632XESB/3100 USB controller USB-2> port 0x1020-0x103f irq                                                                               17 at device 29.1 on pci0
uhci1: [GIANT-LOCKED]
uhci1: [ITHREAD]
usb1: <Intel 631XESB/632XESB/3100 USB controller USB-2> on uhci1
usb1: USB revision 1.0
uhub1: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb1
uhub1: 2 ports with 2 removable, self powered
uhci2: <Intel 631XESB/632XESB/3100 USB controller USB-3> port 0x1040-0x105f irq                                                                               18 at device 29.2 on pci0
uhci2: [GIANT-LOCKED]
uhci2: [ITHREAD]
usb2: <Intel 631XESB/632XESB/3100 USB controller USB-3> on uhci2
usb2: USB revision 1.0
uhub2: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb2
uhub2: 2 ports with 2 removable, self powered
uhci3: <Intel 631XESB/632XESB/3100 USB controller USB-4> port 0x1060-0x107f irq                                                                               19 at device 29.3 on pci0
uhci3: [GIANT-LOCKED]
uhci3: [ITHREAD]
usb3: <Intel 631XESB/632XESB/3100 USB controller USB-4> on uhci3
usb3: USB revision 1.0
uhub3: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb3
uhub3: 2 ports with 2 removable, self powered
ehci0: <Intel 63XXESB USB 2.0 controller> mem 0xf7df0000-0xf7df03ff irq 16 at de                                                                              vice 29.7 on pci0
ehci0: [GIANT-LOCKED]
ehci0: [ITHREAD]
usb4: waiting for BIOS to give up control
usb4: EHCI version 1.0
usb4: companion controllers, 2 ports each: usb0 usb1 usb2 usb3
usb4: <Intel 63XXESB USB 2.0 controller> on ehci0
usb4: USB revision 2.0
uhub4: <Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1> on usb4
uhub4: 8 ports with 8 removable, self powered
pcib14: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci1: <ACPI PCI bus> on pcib14
vgapci0: <VGA-compatible display> port 0x3000-0x30ff mem 0xd8000000-0xdfffffff,0                                                                              xf7ff0000-0xf7ffffff irq 23 at device 3.0 on pci1
pci1: <base peripheral> at device 4.0 (no driver attached)
pci1: <base peripheral> at device 4.2 (no driver attached)
uhci4: <UHCI (generic) USB controller> port 0x3800-0x381f irq 22 at device 4.4 o                                                                              n pci1
uhci4: [GIANT-LOCKED]
uhci4: [ITHREAD]
usb5: <UHCI (generic) USB controller> on uhci4
usb5: USB revision 1.0
uhub5: <(0x103c) UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb5
uhub5: 2 ports with 2 removable, self powered
pci1: <serial bus> at device 4.6 (no driver attached)
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel 63XXESB2 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,                                                                              0x376,0x500-0x50f irq 17 at device 31.1 on pci0
ata0: <ATA channel 0> on atapci0
ata0: [ITHREAD]
ata1: <ATA channel 1> on atapci0
ata1: [ITHREAD]
acpi_tz0: <Thermal Zone> on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbd0: [ITHREAD]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: [ITHREAD]
psm0: model IntelliMouse, device ID 3
sio0: configured irq 4 not in bitmap of probed irqs 0
sio0: port may not be enabled
sio0: configured irq 4 not in bitmap of probed irqs 0
sio0: port may not be enabled
sio0: <Standard PC COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
sio0: [FILTER]
orm0: <ISA Option ROMs> at iomem 0xc0000-0xcafff,0xe6000-0xe7fff on isa0
ppc0: cannot reserve I/O port range
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
sio1 at port 0x2f8-0x2ff irq 3 on isa0
sio1: type 16550A
sio1: [FILTER]
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
ukbd0: <HP Virtual Keyboard, class 0/0, rev 1.10/0.02, addr 2> on uhub5
kbd2 at ukbd0
ums0: <HP Virtual Keyboard, class 0/0, rev 1.10/0.02, addr 2> on uhub5
ums0: 3 buttons.
uhub6: <HP Virtual Hub, class 9/0, rev 1.10/0.01, addr 3> on uhub5
uhub6: 7 ports with 7 removable, self powered
NULL mp in getnewvnode()
Timecounters tick every 1.000 msec
hptrr: no controller detected.
acd0: DVDROM <HL-DT-STDVD-ROM GDR-D10N/3.00> at ata0-master UDMA33
SMP: AP CPU #2 Launched!
SMP: AP CPU #1 Launched!
SMP: AP CPU #3 Launched!
SMP: AP CPU #5 Launched!
SMP: AP CPU #4 Launched!
SMP: AP CPU #6 Launched!
SMP: AP CPU #7 Launched!
da0 at ciss0 bus 0 target 0 lun 0
da0: <COMPAQ RAID 5  VOLUME OK> Fixed Direct Access SCSI-5 device
da0: 135.168MB/s transfers
da0: 419946MB (860051248 512 byte sectors: 255H 32S/T 65535C)
Trying to mount root from ufs:/dev/da0s1a
WARNING: / was not properly dismounted
bce0: link state changed to UP
lagg0: link state changed to UP
vlan2: link state changed to UP
vlan8: link state changed to UP
vlan11: link state changed to UP
vlan12: link state changed to UP
bce1: link state changed to UP
>How-To-Repeat:
Not sure yet...generate UDP traffic, it seems.
>Fix:


>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list