[Fwd: Re: bge Ierr rate increase from 5.3R -> 6.1R]
Oleg Bulyzhin
oleg at FreeBSD.org
Tue Feb 6 22:19:02 UTC 2007
On Wed, Feb 07, 2007 at 01:31:39AM +1100, Bruce Evans wrote:
> On Tue, 6 Feb 2007, Oleg Bulyzhin wrote:
>
> >Sorry for the late reply - i was AFK for some time and didnt read mails.
> >Patch against 6.2R attached, please let me know does it helps or not.
>
> I didn't test the old version of this since I have too many local changes
> in my non-6.x versions. I might test the 6.x version.
>
> I use jdp's quicker fix. It works fine for detecting cable unplug
> and replug, but link detection is still very bad at boot time and
> after down/up (seems to be worse for down/up than unplug/replug?).
> Link detection in -current generally seems to be much worse than
> in 5.2. On some systems I use two ping -c2's early in the boot to
> wait for the link to actually be up. The first ping tends to fail
> and the second tends to work, both long after the lonk claims to
> be up. Then other network activity still takes too long to start.
> Without the pings, an "ntpdate -b" early in the boot fails about
> half the time and gives messed up timing activity when it fails,
> and initial nfs mounts tak 30-60 seconds. Later after down/up and
> waiting for the "up" message, ttcp -u usually fails to connect the
> first time and then works normally with no failure or connection
> delay the second time.
Could you please give some more details? I'm intersted in:
- chip version
- are you using 'auto' media or fixed one?
- have you tried verbose boot? (MAC's link state (bge_link) reported with it).
- is there recipe how to trigger erroneous behaviour? i'll try it on mine
bge cards. (i have bcm5721 5700 & 5701, but i didnt notice errors in
link handling with -current driver (and i had problems with 5.x driver))
The fact 'ping' workaround does help looks like lost interrupt.
>
> % Index: sys/dev/bge/if_bgereg.h
> % ===================================================================
> % RCS file: /home/ncvs/src/sys/dev/bge/if_bgereg.h,v
> % retrieving revision 1.36.2.8.2.1
> % diff -u -r1.36.2.8.2.1 if_bgereg.h
> % --- sys/dev/bge/if_bgereg.h 21 Dec 2006 21:53:54 -0000 1.36.2.8.2.1
> % +++ sys/dev/bge/if_bgereg.h 5 Feb 2007 23:23:22 -0000
> % @@ -2460,8 +2460,13 @@
> % uint32_t bge_tx_buf_ratio;
> % int bge_if_flags;
> % int bge_txcnt;
> % - int bge_link; /* link state */
> % - int bge_link_evt; /* pending link event */
> % + uint32_t bge_sts;
> % +#define BGE_STS_LINK 0x00000001 /* MAC link status */
> % +#define BGE_STS_LINK_EVT 0x00000002 /* pending link
> event */
> % +#define BGE_STS_AUTOPOLL 0x00000004 /* PHY auto-polling
> */
> % +#define BGE_STS_BIT(sc, x) ((sc)->bge_sts & (x))
> % +#define BGE_STS_SETBIT(sc, x) ((sc)->bge_sts |= (x))
> % +#define BGE_STS_CLRBIT(sc, x) ((sc)->bge_sts &= ~(x))
>
> Style bugs:
> - inconsistents tabs. bge mostly consitently doesn't follow KNF for the
> tab after #define.
mea culpa.
> - I don't like obfuscating bit testing and setting using macros. The
> above macros are quite different from the BGE and PCI SETBIT and CLRBIT
> macros -- the latter operate on hardware bits and do something useful
> by hiding the details of the hardware accesses, and aren't assoicated
> with bit testing macros. Apart from the above, testing of hardware bits
> and testing an setting of software bits is always done using direct
> "&" and "|" operations in bge.
I'm going to agree with you. (I was hesitating about using macros here and
your arguments look reasonable.)
>
> % Index: sys/dev/bge/if_bge.c
> % ===================================================================
> % RCS file: /home/ncvs/src/sys/dev/bge/if_bge.c,v
> % retrieving revision 1.91.2.18.2.1
> % diff -u -r1.91.2.18.2.1 if_bge.c
> % --- sys/dev/bge/if_bge.c 21 Dec 2006 21:53:54 -0000 1.91.2.18.2.1
> % +++ sys/dev/bge/if_bge.c 5 Feb 2007 23:23:29 -0000
> % @@ -2645,12 +2648,12 @@
> %
> % /* Note link event. It will be processed by POLL_AND_CHECK_STATUS
> cmd */
> % if (statusword & BGE_STATFLAG_LINKSTATE_CHANGED)
> % - sc->bge_link_evt++;
> % + BGE_STS_SETBIT(sc, BGE_STS_LINK_EVT);
> %
> % if (cmd == POLL_AND_CHECK_STATUS)
> % if ((sc->bge_asicrev == BGE_ASICREV_BCM5700 &&
> % sc->bge_chipid != BGE_CHIPID_BCM5700_B2) ||
> % - sc->bge_link_evt || sc->bge_tbi)
> % + BGE_STS_BIT(sc, BGE_STS_LINK_EVT) || sc->bge_tbi)
> % bge_link_upd(sc);
>
> I suspected this is the cause of slowness for polling in idle (see previous
> mail), but since it doesn't do any PCI access directly it should be fast
> enough.
> The macros make it unclear that it doesn't do any PCI accesses
> directly.
Agree.
>
> % @@ -2699,7 +2702,7 @@
> %
> % if ((sc->bge_asicrev == BGE_ASICREV_BCM5700 &&
> % sc->bge_chipid != BGE_CHIPID_BCM5700_B2) ||
> % - statusword || sc->bge_link_evt)
> % + statusword || BGE_STS_BIT(sc, BGE_STS_LINK_EVT))
> % bge_link_upd(sc);
> %
> % if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
>
> The software link status handling causes a problem in the interrupt
> handler. To avoid pessimizing the case of shared interrupts, the
> interrupt handler should be able to read the status word from the
> status block and return without doing anything if the interrupt is not
> for it. This can be done without acquiring the driver lock (since the
> driver lock is neither necessary nor sufficient for accessing the
> status block). Software link status gets in the way of this, since
> accessing it requires the driver lock.
You are right, but at the moment, bge_intr() is locked and does not care
about shared interrupts. If we are going to fix this i would vote for
using taskqueue api - in this case we can test 'link event' flag inside
locked taskqueue thread.
>(Note that `statusword' in the
> above is now bogusly named. It used to be the status word from the
> status block but is now the mac status for a PCI register. It is still
> from the status block in similar code in bge_poll(). Not pessimizing
> the case of shared interrupts requires using the status block again,
> and then the read from the PCI register might become a dummy.)
We can not avoid pci register read before syncing status block - we should
flush data posted at pci bridge, otherwise we can 'miss' interrupt.
>
> Elsewhere, you added code to force interrupt after link status changes.
> I think this is to get the interrupt handler to look at the software
> link status in the above. When I first saw it, I thought that forcing
> an interrupt is needed in more places.
Forced interrupt used for TBI cards since they do not have PHY auto-polling.
>
> There are also complications to work around hardware bugs in reporting
> the link status. Maybe the complications can be avoided by pushing
> them to the tick routine. My version doesn't worry about them except
> in a commment and returns early without locking if the status block
> doesn't claim to have been updated.
>
> Bruce
--
Oleg.
================================================================
=== Oleg Bulyzhin -- OBUL-RIPN -- OBUL-RIPE -- oleg at rinet.ru ===
================================================================
More information about the freebsd-net
mailing list