[Fwd: Re: bge Ierr rate increase from 5.3R -> 6.1R]

Tue Feb 6 22:19:02 UTC 2007

On Wed, Feb 07, 2007 at 01:31:39AM +1100, Bruce Evans wrote:
> On Tue, 6 Feb 2007, Oleg Bulyzhin wrote:
> 
> >Sorry for the late reply - i was AFK for some time and didnt read mails.
> >Patch against 6.2R attached, please let me know does it helps or not.
> 
> I didn't test the old version of this since I have too many local changes
> in my non-6.x versions.  I might test the 6.x version.
> 
> I use jdp's quicker fix.  It works fine for detecting cable unplug
> and replug, but link detection is still very bad at boot time and
> after down/up (seems to be worse for down/up than unplug/replug?).
> Link detection in -current generally seems to be much worse than
> in 5.2.  On some systems I use two ping -c2's early in the boot to
> wait for the link to actually be up.  The first ping tends to fail
> and the second tends to work, both long after the lonk claims to
> be up.  Then other network activity still takes too long to start.
> Without the pings, an "ntpdate -b" early in the boot fails about
> half the time and gives messed up timing activity when it fails,
> and initial nfs mounts tak 30-60 seconds.  Later after down/up and
> waiting for the "up" message, ttcp -u usually fails to connect the
> first time and then works normally with no failure or connection
> delay the second time.

Could you please give some more details? I'm intersted in:
- chip version
- are you using 'auto' media or fixed one?
- have you tried verbose boot? (MAC's link state (bge_link) reported with it).
- is there recipe how to trigger erroneous behaviour? i'll try it on mine
  bge cards. (i have bcm5721 5700 & 5701, but i didnt notice errors in
  link handling with -current driver (and i had problems with 5.x driver))

The fact 'ping' workaround does help looks like lost interrupt.

> 
> % Index: sys/dev/bge/if_bgereg.h
> % ===================================================================
> % RCS file: /home/ncvs/src/sys/dev/bge/if_bgereg.h,v
> % retrieving revision 1.36.2.8.2.1
> % diff -u -r1.36.2.8.2.1 if_bgereg.h
> % --- sys/dev/bge/if_bgereg.h	21 Dec 2006 21:53:54 -0000	1.36.2.8.2.1
> % +++ sys/dev/bge/if_bgereg.h	5 Feb 2007 23:23:22 -0000
> % @@ -2460,8 +2460,13 @@
> %  	uint32_t		bge_tx_buf_ratio;
> %  	int			bge_if_flags;
> %  	int			bge_txcnt;
> % -	int			bge_link;	/* link state */
> % -	int			bge_link_evt;	/* pending link event */
> % +	uint32_t		bge_sts;
> % +#define	BGE_STS_LINK		0x00000001	/* MAC link status */
> % +#define	BGE_STS_LINK_EVT	0x00000002	/* pending link 
> event */
> % +#define	BGE_STS_AUTOPOLL	0x00000004	/* PHY auto-polling  
> */
> % +#define BGE_STS_BIT(sc, x)	((sc)->bge_sts & (x))
> % +#define BGE_STS_SETBIT(sc, x)	((sc)->bge_sts |= (x))
> % +#define BGE_STS_CLRBIT(sc, x)	((sc)->bge_sts &= ~(x))
> 
> Style bugs:
> - inconsistents tabs.  bge mostly consitently doesn't follow KNF for the
>   tab after #define.

mea culpa.

> - I don't like obfuscating bit testing and setting using macros.  The
>   above macros are quite different from the BGE and PCI SETBIT and CLRBIT
>   macros -- the latter operate on hardware bits and do something useful
>   by hiding the details of the hardware accesses, and aren't assoicated
>   with bit testing macros.  Apart from the above, testing of hardware bits
>   and testing an setting of software bits is always done using direct
>   "&" and "|" operations in bge.

I'm going to agree with you. (I was hesitating about using macros here and
your arguments look reasonable.)

> 
> % Index: sys/dev/bge/if_bge.c
> % ===================================================================
> % RCS file: /home/ncvs/src/sys/dev/bge/if_bge.c,v
> % retrieving revision 1.91.2.18.2.1
> % diff -u -r1.91.2.18.2.1 if_bge.c
> % --- sys/dev/bge/if_bge.c	21 Dec 2006 21:53:54 -0000	1.91.2.18.2.1
> % +++ sys/dev/bge/if_bge.c	5 Feb 2007 23:23:29 -0000
> % @@ -2645,12 +2648,12 @@
> % 
> %  	/* Note link event. It will be processed by POLL_AND_CHECK_STATUS 
> cmd */
> %  	if (statusword & BGE_STATFLAG_LINKSTATE_CHANGED)
> % -		sc->bge_link_evt++;
> % +		BGE_STS_SETBIT(sc, BGE_STS_LINK_EVT);
> % 
> %  	if (cmd == POLL_AND_CHECK_STATUS)
> %  		if ((sc->bge_asicrev == BGE_ASICREV_BCM5700 &&
> %  		    sc->bge_chipid != BGE_CHIPID_BCM5700_B2) ||
> % -		    sc->bge_link_evt || sc->bge_tbi)
> % +		    BGE_STS_BIT(sc, BGE_STS_LINK_EVT) || sc->bge_tbi)
> %  			bge_link_upd(sc);
> 
> I suspected this is the cause of slowness for polling in idle (see previous
> mail), but since it doesn't do any PCI access directly it should be fast
> enough.
> The macros make it unclear that it doesn't do any PCI accesses
> directly.
Agree.
> 
> % @@ -2699,7 +2702,7 @@
> % 
> %  	if ((sc->bge_asicrev == BGE_ASICREV_BCM5700 &&
> %  	    sc->bge_chipid != BGE_CHIPID_BCM5700_B2) ||
> % -	    statusword || sc->bge_link_evt)
> % +	    statusword || BGE_STS_BIT(sc, BGE_STS_LINK_EVT))
> %  		bge_link_upd(sc);
> % 
> %  	if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
> 
> The software link status handling causes a problem in the interrupt
> handler.  To avoid pessimizing the case of shared interrupts, the
> interrupt handler should be able to read the status word from the
> status block and return without doing anything if the interrupt is not
> for it.  This can be done without acquiring the driver lock (since the
> driver lock is neither necessary nor sufficient for accessing the
> status block).  Software link status gets in the way of this, since
> accessing it requires the driver lock.
You are right, but at the moment, bge_intr() is locked and does not care
about shared interrupts.  If we are going to fix this i would vote for 
using taskqueue api - in this case we can test 'link event' flag inside
locked taskqueue thread.

>(Note that `statusword' in the
> above is now bogusly named.  It used to be the status word from the
> status block but is now the mac status for a PCI register.  It is still
> from the status block in similar code in bge_poll().  Not pessimizing
> the case of shared interrupts requires using the status block again,
> and then the read from the PCI register might become a dummy.)
We can not avoid pci register read before syncing status block - we should
flush data posted at pci bridge, otherwise we can 'miss' interrupt.

> 
> Elsewhere, you added code to force interrupt after link status changes.
> I think this is to get the interrupt handler to look at the software
> link status in the above.  When I first saw it, I thought that forcing
> an interrupt is needed in more places.
Forced interrupt used for TBI cards since they do not have PHY auto-polling.

> 
> There are also complications to work around hardware bugs in reporting
> the link status.  Maybe the complications can be avoided by pushing
> them to the tick routine.  My version doesn't worry about them except
> in a commment and returns early without locking if the status block
> doesn't claim to have been updated.
> 
> Bruce

-- 
Oleg.

================================================================
=== Oleg Bulyzhin -- OBUL-RIPN -- OBUL-RIPE -- oleg at rinet.ru ===
================================================================