kern/68351: bge0 watchdog timeout on 5.2.1 and -current, 5.1 is ok

Vadim Mikhailov freebsd-bugs at mikhailov.org
Sat Jun 26 02:11:26 GMT 2004


>Number:         68351
>Category:       kern
>Synopsis:       bge0 watchdog timeout on 5.2.1 and -current, 5.1 is ok
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Jun 26 02:10:23 GMT 2004
>Closed-Date:
>Last-Modified:
>Originator:     Vadim Mikhailov
>Release:        FreeBSD 5.2.1-RELEASE-p8 i386
>Organization:
>Environment:
System: FreeBSD vortex.xxx.com 5.2.1-RELEASE-p8 FreeBSD 5.2.1-RELEASE-p8 #0: Thu Jun 25 11:57:42 PST 2003 xxx at vortex.xxx.com:/usr/obj/usr/src/sys/VORTEX i386

>Description:

I have a Dell PowerEdge 1750 server with 2 Xeon 3.0 GHZ CPUs, 4 GB RAM and 2 onboard gigabit ethernet ports:

bge0: <Broadcom BCM5704C Dual Gigabit Ethernet, ASIC rev. 0x2002> mem 0xfcd20000-0xfcd2ffff,0xfcd30000-0xfcd3ffff irq 17 at device 0.0 on pci2
bge1: <Broadcom BCM5704C Dual Gigabit Ethernet, ASIC rev. 0x2002> mem 0xfcd00000-0xfcd0ffff,0xfcd10000-0xfcd1ffff irq 18 at device 0.1 on pci2
      
Only bge0 is used, with jumbo frames (my gigabit switch PowerConnect 5224 supports them):

bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 9000
    options=1b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING>
    inet 172.xx.xx.xx netmask 0xfffff800 broadcast 172.xx.xx.255
    ether 00:06:5b:ef:63:e6
    media: Ethernet autoselect (1000baseTX <full-duplex>)
    status: active

This box has two dualport SCSI adapters:

mpt0: <LSILogic 1030 Ultra4 Adapter> port 0xbc00-0xbcff mem 0xfcb20000-0xfcb2ffff,0xfcb30000-0xfcb3ffff irq 13 at device 5.0 on pci4
mpt1: <LSILogic 1030 Ultra4 Adapter> port 0xb800-0xb8ff mem 0xfcb00000-0xfcb0ffff,0xfcb10000-0xfcb1ffff irq 16 at device 5.1 on pci4
ahc0: <Adaptec 3960D Ultra160 SCSI adapter> port 0xdc00-0xdcff mem 0xfcf01000-0xfcf01fff irq 19 at device 4.0 on pci1
ahc1: <Adaptec 3960D Ultra160 SCSI adapter> port 0xd800-0xd8ff mem 0xfcf00000-0xfcf00fff irq 20 at device 4.1 on pci1

Each adapter has disks attached to them. Firmware on motherboard and all peripherial
devices is upgraded to the very latest versions from Dell.
This setup works more or less ok under FreeBSD 5.1-RELEASE-p8 (GENERIC kernel with SMP enabled),
but once a month or two machine reboots under load, so I want to upgrade it to 5.2.1-RELEASE.
But when I boot 5.2.1-RELEASE or later kernel (-current) on this box, network adapter locks up.
I see these messages on console and in the logs:

Jun 25 15:25:22 vortex kernel: bge0: watchdog timeout -- resetting
						   
If I do "ifconfig bge0 down up", network becomes available for few seconds and then
machine is not pingable again. I ran "systat -v" and have noticed that ping stops
working exactly when I see any interrupt coming to mpt or ahc (i.e. on any disk activity).
						   
One visible difference between 5.1 (where it works) and 5.2.1/current (where it doesn't)
is that interrupts to PCI devices are getting assigned differently:

IRQ map under 5.1: mpt0 13, mpt1 16, bge0 17, bge0 18, ahc0 19, ahc1 20,
  and under 5.2.1: mpt0 18, mpt1 19, bge0 16, bge1 17, ahc0 20, ahc1 21.

I have tried to change IRQ assignment to PCI devices in the BIOS, but it didn't change
anything from FreeBSD point of view. I have also tried to boot 5.2.1 with ACPI disabled -
result is the same. Disabling jumbo frames does not seem to have any effect either.
Also I tried this on another identical 1750 box (I have few of them) - same result.
It works fine under Linux kernel 2.4.18.

>How-To-Repeat:
      Install FreeBSD 5.2.1-RELEASE (or -current) on Dell PowerEdge 1750,
connect bge0 to gigabit switch and you will see bge0 watchdog timeouts.
FreeBSD 5.1-RELEASE and Linux 2.4.18 work fine on the same hardware.

>Fix:

>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list