Watchdog Timeout - bge devices
Scott Long
scottl at samsco.org
Wed Oct 4 05:18:26 UTC 2006
John Marshall wrote:
> $ dmesg | grep bge
> bge0: <Broadcom BCM5705K Gigabit Ethernet, ASIC rev. 0x3003> mem
> 0xe8200000-0xe820ffff irq 17 at device 4.0 on pci4
> miibus1: <MII bus> on bge0
> bge0: Ethernet address: 00:0b:cd:e7:51:ba
> bge0: watchdog timeout -- resetting
> bge0: link state changed to DOWN
> bge0: link state changed to UP
>
> I initially pronounced the network cable dead and replaced it. Then I
> suspected the FastEthernet switch port and relocated to a different
> port. Watchdog timeouts persisted. I concluded that the bge hardware
> must be flaky until I read a recent thread on em device watchdog
> timeouts which led me to wonder about CPU scheduling.
>
> The server experiencing the bge timeouts was using SCHED_ULE. I built
> 6.2-PRERELEASE on a spare disk and booted the problem server from that
> disk - bge problem persisted.
>
> We have a second (identical) problem-free server configured with
> SCHED_4BSD. I reconfigured both machines so that the first machine (now
> 6.2-PRERELEASE) used SCHED_4BSD and the second machine (6.1-RELEASE)
> uses SCHED_ULE. Both machines are configured with PREEMPTION.
>
> +-----------------------------------------------+
> | THE PROBLEM FOLLOWS SCHED_ULE ACROSS MACHINES |
> +-----------------------------------------------+
>
> The machines are hp ProLiant ML110 servers.
>
> There is nothing sharing the interrupt with the bge device. No USB
> drivers are loaded.
>
>
> $ vmstat -i
> interrupt total rate
> irq1: atkbd0 70 0
> irq6: fdc0 9 0
> irq14: ata0 1234430 6
> irq15: ata1 47 0
> irq17: bge0 17543591 93
> irq26: fxp0 70832 0
> cpu0: timer 376381765 1999
> Total 395230744 2099
>
>
> $ sysctl kern.version kern.sched kern.smp hw.machine hw.model dev.bge
> kern.version: FreeBSD 6.1-RELEASE-p10 #1: Mon Oct 2 08:36:56 AEST 2006
>
> kern.sched.name: ule
> kern.sched.slice_min: 10
> kern.sched.slice_max: 142
> kern.sched.preemption: 1
> kern.smp.maxcpus: 1
> kern.smp.active: 0
> kern.smp.disabled: 0
> kern.smp.cpus: 1
> hw.machine: i386
> hw.model: Intel(R) Pentium(R) 4 CPU 2.80GHz
> dev.bge.0.%desc: Broadcom BCM5705K Gigabit Ethernet, ASIC rev. 0x3003
> dev.bge.0.%driver: bge
> dev.bge.0.%location: slot=4 function=0
> dev.bge.0.%pnpinfo: vendor=0x14e4 device=0x1654 subvendor=0x103c
> subdevice=0x1654 class=0x020000
> dev.bge.0.%parent: pci4
>
> Is there any other information I ought to post to help with diagnosis -
> or is this a known problem? (I've only subscribed recently)
>
> John Marshall.
Very interesting data point. I wonder if this accounts for some of the
inconsistency in the reporting from others. In any case, SCHED_ULE is
still considered to be highly experimental. Hopefully it will get some
more attention in the near future to bring it closer to production
quality.
Scott
More information about the freebsd-stable
mailing list