Problems with bge (possibly related to r208993)

Pyun YongHyeon pyunyh at gmail.com
Tue Jun 15 17:50:49 UTC 2010


On Tue, Jun 15, 2010 at 07:09:27AM +0400, Artem Kim wrote:
> On Tuesday 15 June 2010 01:03:43 Pyun YongHyeon wrote:
> > On Sun, Jun 13, 2010 at 07:34:11PM +0400, Artem Kim wrote:
> > > Hi,
> > >
> > > I have two routers (HP DL140G3):
> > >
> > > NAS3 FreeBSD 8.1-PRERELEASE # 0: Thu Jun 3 04:13:07 MSD 2010 i386
> > > NAS2 FreeBSD 8.1-PRERELEASE # 0: Sat Jun 12 16:42:19 UTC 2010 i386
> > > (r208993 included)
> > >
> > > bge0 @ pci0: 19:0:0: class = 0x020000 card = 0x3260103c chip = 0x165914e4
> > > rev = 0x11 hdr = 0x00
> > >     vendor = 'Broadcom Corporation'
> > >     device = 'NetXtreme Gigabit Ethernet PCI Express (BCM5721)'
> > >     class = network
> > >     subclass = ethernet
> > > bge1 @ pci0: 20:0:0: class = 0x020000 card = 0x3260103c chip = 0x165914e4
> > > rev = 0x11 hdr = 0x00
> > >     vendor = 'Broadcom Corporation'
> > >     device = 'NetXtreme Gigabit Ethernet PCI Express (BCM5721)'
> > >     class = network
> > >     subclass = ethernet
> > >
> > >
> > > I have some problems with bge on NAS2.
> > >
> > > After some time (about 15 hours) bge1 stops flowing traffic.
> > > NAS3 NAS3 - pppoe server. Through bge1 passes only ip traffic through
> > > bge0 no ip-traffic.
> > > Problems occur only with the bge1 interface on NAS2.
> > >
> > >
> > > Traffic through bge1 not pass until I will not do "ifconfig bge1 down
> > > ifconfig bge1 up".
> > >
> > > When I do "ifconfig bge0 down" NIC does not shutdown:
> > >
> > > nas2 # ifconfig bge1 down
> > > nas2 #
> > > nas2 # ifconfig bge1
> > > bge1: flags = 8843 <UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu
> > > 1500 options = 8009b
> > > <RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE>
> > >         ether XXXXXXXXXXXXX
> > >         inet YYYYYYYYYYY netmask 0xffffffc0 broadcast YYYYYYYYYYYY
> > >         media: Ethernet autoselect (1000baseT <full-duplex>)
> > >         status: active
> > >
> > > LED also indicates that the NIC is active.
> > >
> > > I left the NAS in a state of "frozen bge1" - and can provide additional
> > > information for diagnosis.
> > 
> > Try run tcpdump on bge1 and see whether driver still see incoming
> > traffic. Also show me the output of "netstat -ndI bge1" and output
> > of "sysctl dev.bge.1.stats". Verbose dmesg output also would be
> > helpful.
> 
> nas2 # netstat-ndI bge1
> Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll Drop
> bge1 1500 <Link#3> 00:1 b: 78: a3: 3c: 01 418543876 1972918 0 446063237 0 0 0
> bge1 1500 XX.XX.6.12 XX.XX.6.133 890,306 - - 1,076,833 - - -
> 

Ok, I see very large number of Ierrs here. When you send some packets
from other hosts to nas2(bge1), do you see Ierrs counter is
increasing?

> 
> Should I add additional debugging options?
> 
> nas2 # sysctl dev.bge.1.stats
> sysctl: unknown oid 'dev.bge.1.stats'
> 
> nas2 # sysctl dev.bge.1
> dev.bge.1.% desc: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 
> 0x004101
> dev.bge.1.% driver: bge
> dev.bge.1.% location: slot = 0 function = 0
> dev.bge.1.% pnpinfo: vendor = 0x14e4 device = 0x1659 subvendor = 0x103c 
> subdevice = 0x3260 class = 0x020000
> dev.bge.1.% parent: pci20
> dev.bge.1.forced_collapse: 0
> 

Ok, this controller lacks hardware MAC statistics counters.

> I can show verbose dmesg, but this requires a reboot so bge1 come out of the 
> current state.
> 

Ok.

> 
> I looked tcpdump on NAS2 - and I only saw the ARP requests from NAS2 (NAS2 - 
> XX.XX.6.133):
> 
> nas2 # tcpdump -i bge1
> tcpdump: verbose output suppressed, use-v or-vv for full protocol decode
> listening on bge1, link-type EN10MB (Ethernet), capture size 96 bytes
> 01:23:43.063238 ARP, Request who-has XX.XX.6.129 tell XX.XX.6.133, length 28
> 01:23:43.162257 ARP, Request who-has XX.XX.6.129 tell XX.XX.6.133, length 28
> 01:23:43.935016 ARP, Request who-has XX.XX.6.129 tell XX.XX.6.133, length 28
> 
> 
> XX.XX.6.129 is l3-switch(AT-X900) default router for NAS2; bge1 is directly 
> connected to the x900.
> 
> I looked tcpdump on the x900:
> 
> awplus # tcpdump -ni vlanXX host XX.XX.6.133
> 05:36:30.455642 arp who-has XX.XX.6.129 tell XX.XX.6.133
> 05:36:30.455898 arp reply XX.XX.6.129 is-at 00:00: cd: 29:6 e: 09
> 05:36:31.483353 arp who-has XX.XX.6.129 tell XX.XX.6.133
> 05:36:31.483505 arp reply XX.XX.6.129 is-at 00:00: cd: 29:6 e: 09
> 05:36:32.511260 arp who-has XX.XX.6.129 tell XX.XX.6.133
> 05:36:32.511353 arp reply XX.XX.6.129 is-at 00:00: cd: 29:6 e: 09
> 05:36:33.539163 arp who-has XX.XX.6.129 tell XX.XX.6.133
> 
> ARP requests from NAS2 (XX.XX.6.133). But on NAS2 I can _only_ see ARP-
> requests from NAS2.
> 
> I added static arp-entry on NAS2 and do ping XX.XX.6.129.
> 
> Then I looked again at tcpdump on XX.XX.6.129:
> 
> awplus # tcpdump -nei vlanXX host XX.XX.6.133
> 06:13:03.472539 00:00: cd: 29:6 e: 09> ff: ff: ff: ff: ff: ff, ethertype ARP 
> (0x0806), length 42: arp who-has XX.XX.6.133 tell XX. XX.6.129
> 06:13:03.526768 00:1 b: 78: a3: 3c: 01> 00:00: cd: 29:6 e: 09, ethertype IPv4 
> (0x0800), length 98: XX.XX.6.133> XX.XX.6.129: ICMP echo request, id 6958, seq 
> 1920, length 64
> 06:13:04.553728 00:1 b: 78: a3: 3c: 01> 00:00: cd: 29:6 e: 09, ethertype IPv4 
> (0x0800), length 98: XX.XX.6.133> XX.XX.6.129: ICMP echo request, id 6958, seq 
> 1921, length 64
> 06:13:04.554495 00:00: cd: 29:6 e: 09> ff: ff: ff: ff: ff: ff, ethertype ARP 
> (0x0806), length 42: arp who-has XX.XX.6.133 tell XX. XX.6.129
> 06:13:05.554486 00:00: cd: 29:6 e: 09> ff: ff: ff: ff: ff: ff, ethertype ARP 
> (0x0806), length 42: arp who-has XX.XX.6.133 tell XX. XX.6.129
> 06:13:05.581488 00:1 b: 78: a3: 3c: 01> 00:00: cd: 29:6 e: 09, ethertype IPv4 
> (0x0800), length 98: XX.XX.6.133> XX.XX.6.129: ICMP echo request, id 6958, seq 
> 1922, length 64
> 
> Traffic comes from XX.XX.6.133 to XX.XX.6.129. I tried to add arp-entry on 
> XX.XX.6.129 (XX.XX.6.133 -> 001b.78a3.3c01) but communication does not 
> restore.

It seems RX does not work at all. Because you have zero Drop(from
netstat) I think you didn't hit mbuf resource shortage situation.
Ierr counter is increased whenever controller drops frames due to
receiving errors(e.g. CRC). Given that you have no cabling issue,
it could be caused by speed/duplex mismatches between bge1 and link
partner. Does the link partner also agrees on resolved speed/duplex
of bge1?


More information about the freebsd-stable mailing list