Intermittent network issues with Freebsd 6.2

Jeremy Chadwick koitsu at FreeBSD.org
Wed Feb 28 09:20:01 UTC 2007


On Thu, Feb 15, 2007 at 03:54:18PM +1100, Dimuthu Parussalla wrote:
> Hi,
> 
> Dmesg output related to bge as follows.
> 
> miibus0: <MII bus> on bge0
> brgphy0: <BCM5750 10/100/1000baseTX PHY> on miibus0
> brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX,
> 1000baseTX-FDX, auto
> bge0: Ethernet address: 00:11:25:e9:7f:58
> bge0: [GIANT-LOCKED]
> pcib6: <ACPI PCI-PCI bridge> at device 5.0 on pci0
> pci8: <ACPI PCI bus> on pcib6
> bge1: <Broadcom BCM5750 B1, ASIC rev. 0x4101> mem 0xc6ff0000-0xc6ffffff irq
> 16 at device 0.0 on pci8
> miibus1: <MII bus> on bge1
> brgphy1: <BCM5750 10/100/1000baseTX PHY> on miibus1
> brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX,
> 1000baseTX-FDX, auto
> bge1: Ethernet address: 00:11:25:e9:7f:59
> bge1: [GIANT-LOCKED]

Interestingly enough, this problem just started haunting us too (out
of no where), on one of our Supermicro systems.  There haven't been
any changes to the network in literally months (no one's been to the
datacenter since December).

Here's our details:

* Upstream switch is an HP ProCurve 2626 .  All ports used are
  100mbit, with auto-select enabled (speed/duplex neg)
* Speed/duplex negotiation is being done correctly.  We have no
  throughput problems (either direction) or otherwise
* netstat -i -n shows no errors, except for two output errors,
  which are probably due to the interface being brought down
  and back up rudely (see below)
* Switch shows no errors on either interface
* Cabling is good (CAT6 none the less)
* Uniprocessor system; kernel not built with SMP

What we see:

Feb 17 11:22:00 eos kernel: bge0: watchdog timeout -- resetting
Feb 17 11:22:00 eos kernel: bge0: link state changed to DOWN
Feb 17 11:22:01 eos kernel: bge0: link state changed to UP
Feb 24 11:20:56 eos kernel: bge0: watchdog timeout -- resetting
Feb 24 11:20:56 eos kernel: bge0: link state changed to DOWN
Feb 24 11:20:58 eos kernel: bge0: link state changed to UP

These timestamps are awfully suspicious; exactly 7 days apart,
almost to the hour?  And no, we have no cronjobs or anything else
that runs at that time (this box is hardly used for anything).

Applicable system information:
(I'm including ichsmb/smbus because it shares an IRQ with bge1;
nothing shares an IRQ with bge0)

bge0: <Broadcom BCM5750 B1, ASIC rev. 0x4101> mem 0xd0100000-0xd010ffff irq 18 at device 0.0 on pci4
miibus0: <MII bus> on bge0
brgphy0: <BCM5750 10/100/1000baseTX PHY> on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto
bge0: Ethernet address: 00:30:48:81:fc:8a
pcib5: <ACPI PCI-PCI bridge> irq 19 at device 28.3 on pci0
pci5: <ACPI PCI bus> on pcib5
bge1: <Broadcom BCM5750 B1, ASIC rev. 0x4101> mem 0xd0200000-0xd020ffff irq 19 at device 0.0 on pci5
miibus1: <MII bus> on bge1
brgphy1: <BCM5750 10/100/1000baseTX PHY> on miibus1
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto
bge1: Ethernet address: 00:30:48:81:fc:8b
ichsmb0: <SMBus controller> port 0x500-0x51f irq 19 at device 31.3 on pci0
ichsmb0: [GIANT-LOCKED]
smbus0: <System Management Bus> on ichsmb0
smb0: <SMBus generic I/O> on smbus0

Odd that pciconf -lv shows this as a BCM5750 A1 while the kernel shows
this as a BCM5750 B1.  Is this indicative of anything?

bge0 at pci4:0:0:  class=0x020000 card=0x02c615d9 chip=0x165914e4 rev=0x11 hdr=0x00
    vendor   = 'Broadcom Corporation'
    device   = 'BCM5750A1 NetXtreme Gigabit Ethernet PCI Express'
    class    = network
    subclass = ethernet
bge1 at pci5:0:0:  class=0x020000 card=0x02c615d9 chip=0x165914e4 rev=0x11 hdr=0x00
    vendor   = 'Broadcom Corporation'
    device   = 'BCM5750A1 NetXtreme Gigabit Ethernet PCI Express'
    class    = network
    subclass = ethernet

[jdc at eos ~]$ vmstat -i
interrupt                          total       rate
irq4: sio0                             6          0
irq6: fdc0                            14          0
irq14: ata0                       520782          0
irq15: ata1                           58          0
irq18: bge0                     21839717         11
irq19: bge1+                       32914          0
cpu0: timer                   3638265059       1968
Total                         3660658550       1981

[jdc at eos ~]$ netstat -in
Name    Mtu Network       Address              Ipkts Ierrs    Opkts Oerrs  Coll
bge0   1500 <Link#1>      00:30:48:81:fc:8a 13841423     0 10349370     2     0
bge0   1500 72.20.106/25  72.20.106.2        3590195     - 10348720     -     -
bge0   1500 72.20.106.3/3 72.20.106.3        2075045     -        0     -     -
bge0   1500 72.20.106.4/3 72.20.106.4        2003973     -        0     -     -
bge0   1500 72.20.106.5/3 72.20.106.5        2328549     -        0     -     -
bge0   1500 72.20.106.6/3 72.20.106.6        2006174     -        0     -     -
bge1   1500 <Link#2>      00:30:48:81:fc:8b     3888     0    29600     0     0
bge1   1500 10            10.72.0.1             2605     -     2605     -     -
lo0   16384 <Link#3>                             641     0      641     0     0
lo0   16384 127           127.0.0.1              641     -      641     -     -
bridg  1500 <Link#4>      86:ec:97:73:50:03    26993     0    30885     0     0
tap0   1500 <Link#5>      00:bd:ed:13:00:00    25712     0     1286     0     0

If a developer wants access to this box, I can provide it.  No serial
console at this time (soon, soon...), but can provide root.

-- 
| Jeremy Chadwick                                 jdc at parodius.com |
| Parodius Networking                        http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP: 4BD6C0CB |



More information about the freebsd-stable mailing list