SunFire X2200 ilo's bge1 DOWN/UP

Daniel Braniss danny at cs.huji.ac.il
Thu May 30 06:44:42 UTC 2013


> 
> --/04w6evG8XlLl3ft
> Content-Type: text/plain; charset=us-ascii
> Content-Disposition: inline
> 
> On Tue, May 28, 2013 at 09:55:24AM +0300, Daniel Braniss wrote:
> > > On Tue, May 28, 2013 at 09:28:00AM +0300, Daniel Braniss wrote:
> > > > > On Mon, May 27, 2013 at 10:59:28AM +0300, Daniel Braniss wrote:
> > > > > > > On Fri, May 24, 2013 at 05:31:13PM +0300, Daniel Braniss wrote:
> > > > > > > > hi, after upgrading to 9.1-stable, this particular hardware - SunFire X2200,
> > > > > > > 
> > > > > > > Show me dmesg(bge(4) and brgphy(4) only) and 'ifconfig bge1' output.
> > > > > > > 
> > > > > > 
> > > > > > bge0: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x009003> mem 
> > > > > > 0xfdff0000-0xfdffffff,0xfdfe0000-0xfdfeffff irq 17 at device 4.0 on pci6
> > > > > > bge0: CHIP ID 0x00009003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz
> > > > > > miibus2: <MII bus> on bge0
> > > > > > brgphy0: <BCM5714 1000BASE-T media interface> PHY 1 on miibus2
> > > > > > brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
> > > > > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
> > > > > > bge0: Ethernet address: 00:1b:24:5d:5b:bd
> > > > > > bge1: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x009003> mem 
> > > > > > 0xfdfc0000-0xfdfcffff,0xfdfb0000-0xfdfbffff irq 18 at device 4.1 on pci6
> > > > > > bge1: CHIP ID 0x00009003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz
> > > > > > miibus3: <MII bus> on bge1
> > > > > > brgphy1: <BCM5714 1000BASE-T media interface> PHY 1 on miibus3
> > > > > > brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
> > > > > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
> > > > > > bge1: Ethernet address: 00:1b:24:5d:5b:be
> > > > > > 
> > > > > > sf-10> ifconfig bge1
> > > > > > bge1: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
> > > > > >         options=8009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTA
> > > > > > TE>
> > > > > >         ether 00:1b:24:5d:5b:be
> > > > > >         nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
> > > > > >         media: Ethernet autoselect (100baseTX <full-duplex>)
> > > > > >         status: active
> > > > > > 
> > > > > 
> > > > > Because bge1 is not UP, I wonder how you get link UP/DOWN events.
> > > > > Do you have some network script run by cron?
> > > > 
> > > > no scripts.
> > > > this port is shared with the ILO/IPMI, and back in March you fixed a problem
> > > > that it was hanging soon after it was initialized by the driver,
> > > > (r248226 - but I'm not sure if it was ever MFC'ed).
> > > 
> > > It was MFCed.
> > > 
> > > > Initialy I thought it could be caused by connections to it from other
> > > > hosts (either via the web, or ssh) so I killed them, but it didn't help.
> > > > without that patch the connection fails, and I don't see any DOWN/UP.
> > > 
> > > Could you check how many number of interrupts you get from bge1?
> > > Ideally you shouldn't get any interrupts for bge1.
> > 
> > it's not even mentioned :-)
> > sf-04> vmstat -i
> > interrupt                          total       rate
> > irq3: uart1                          964          0
> > irq4: uart0                            6          0
> > irq14: ata0                       227354          0
> > irq17: bge0                      1021981          2
> > irq21: ohci0                          28          0
> > irq22: ehci0                           2          0
> > irq23: atapci1                    293228          0
> > cpu0:timer                     383244076       1124
> > cpu1:timer                       2225144          6
> > cpu2:timer                       2056087          6
> > cpu3:timer                       2093943          6
> > Total                          391162813       1147
> > 
> 
> Then the only way link UP/DOWN event could be generated for DOWN
> interface would be invocation of media status query
> (i.e. ifconfig -a) triggered by an external application.  Most
> drivers I touched check IFF_UP flag before poking media status
> register. However I'm not sure you're seeing this issue because you
> do not use any network script run by cron.
> Anyway, try attached patch and let me know whether it makes any
> difference.
> 
> > > 
> > > > 
> > > > > 
> > > > > > > > is toggeling bge1 DOWN/UP every few hours, this port is being used by the ILO.
> > > > > > > > To check, I upgraded another identical host, and the same problem appears. 
> > > > > > > 
> > > > > > > What is the last known working revision?
> > > > > > 
> > > > > > I have no idea, but I have older versions, and ill start from the oldets 
> > > > > > (9.1-prerelease), but
> > > > > > it will take time, since it takes hours till it happens.
> > > > > > 
> > > > > 
> > > > > ok.
> > > > 
> > > > 
> > 
> > 
> 
> --/04w6evG8XlLl3ft
> Content-Type: text/x-diff; charset=us-ascii
> Content-Disposition: attachment; filename="bge.media_sts.diff"
> 
> Index: sys/dev/bge/if_bge.c
> ===================================================================
> --- sys/dev/bge/if_bge.c	(revision 251021)
> +++ sys/dev/bge/if_bge.c	(working copy)
> @@ -5583,6 +5583,10 @@ bge_ifmedia_sts(struct ifnet *ifp, struct ifmediar
>  
>  	BGE_LOCK(sc);
>  
> +	if ((ifp->if_flags & IFF_UP) == 0) {
> +		BGE_UNLOCK(sc);
> +		return;
> +	}
>  	if (sc->bge_flags & BGE_FLAG_TBI) {
>  		ifmr->ifm_status = IFM_AVALID;
>  		ifmr->ifm_active = IFM_ETHER;
> 
> --/04w6evG8XlLl3ft--
after 18hs, the logs are empty!
it seems the patch fixes the problem.

now maybe it's time to hunt for who is randomly calling for bge_ifmedia_sts ...

thanks,
	danny




More information about the freebsd-stable mailing list