regression: msk0 watchdog timeout and interrupt storm

Mon Jan 6 04:30:48 UTC 2014

Pyun,

Replying to self since I did not get your reply but saw it on the
stable10 mailing list archive.  I pasted in your responses so its
really a reply to you.

Sorry for the delay to your email on Jan 2.  I had some email trouble
(self induced by DNS change) that should be fixed now.

Curtis

In message <201401012144.s01LivSi099164 at maildrop2.v6ds.occnc.com>
Curtis Villamizar writes:
> >  
> > Replying to self (and top posting).
> >  
> > I'm not sure if the problem is fixed or masked.
> >  
> > The symptom (watchdog and interrupt storm) has gone away with the
> > following change in if_mskreg.h:
> >  
> > @@ -2329,8 +2329,13 @@
> >   */
> >  #if (BUS_SPACE_MAXADDR > 0xFFFFFFFF)
> >  #define        MSK_64BIT_DMA
> > +#if 1
> > +#define MSK_TX_RING_CNT                256
> > +#define MSK_RX_RING_CNT                256
> > +#else
> >  #define MSK_TX_RING_CNT                384
> >  #define MSK_RX_RING_CNT                512
> > +#endif
> >  #else
> >  #undef MSK_64BIT_DMA
> >  #define MSK_TX_RING_CNT                256
> >  
> > This backs out a very small part of the change made to if_mskreg.h in
> > revision 227582.
> >  
> > The following is what I think is affected by this change:
> >  
> > 	count = imin(4096, roundup2(count, 1024));
> > 	sc->msk_stat_count = count;
> > 	stat_sz = count * sizeof(struct msk_stat_desc);
> >  
> > The change makes count end up being 1024 (and stat_sz 8192).
> >  
> > For me the problem is fixed/masked but I would also consider putting
> > the increase to MSK_TX_RING_CNT and MSK_RX_RING_CNT back and forcing
> > count above to be no greater than 1024 if that would help someone else
> > debug the problem.  I'm not sure where the 4096 came from but
> > replacing that with 1024 is equivalent to "count = 1024" with no math
> > involved.
>  
> Marvell calls DMA descriptors as LEs. The maximum number of status
> LEs supported by controller is 4096 and it should be large enough
> to hold status LE update(for dual-port controllers, the status
> DMA block is shared between each port).

Yes.  I am aware of this, but regardless I ran into this bug and
forcing MSK_TX_RING_CNT and MSK_RX_RING_CNT removed the symptom.

> > This does seem to me like a regression in 10.0 caused by the change to
> > if_mskreg.h (Nov 16).  The workaround so far has been fine for me.
>  
> If you revert the change made in r258790, does the issue go away?
> Are you running amd64?  Because you touched #if (BUS_SPACE_MAXADDR
> > 0xFFFFFFFF) block in if_mskreg.h I guess you're running amd64 but
> I need confirmation. If your system have more than 4GB memory on
> amd64, could you reduce amount of available memory to be less than
> 4GB?(i.e. set hw.physmem in loader.conf)
> Also would you show me dmesg(8) output(msk(4) and e1000phy(4) only)
> to know exact Yukon controller model?

Yes it is AMD64.

uname -m
amd64

CPU: AMD Athlon(tm) II X2 B24 Processor (2992.58-MHz K8-class CPU)
 Origin = "AuthenticAMD" Id = 0x100f63 Family = 0x10 Model = 0x6 Stepping = 3
 Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
 Features2=0x802009<SSE3,MON,CX16,POPCNT>
 AMD
 Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>
 AMD
 Features2=0x37ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT>
 TSC: P-state invariant

pciconf -lcv
[...]
mskc0 at pci0:2:0:0:       class=0x020000 card=0x305817aa chip=0x438011ab
  rev=0x10 hdr=0x00
    vendor     = 'Marvell Technology Group Ltd.'
    device     = '88E8057 PCI-E Gigabit Ethernet Controller'
    class      = network
    subclass   = ethernet
    cap 01[48] = powerspec 3  supports D0 D1 D2 D3  current D0
    cap 05[5c] = MSI supports 1 message, 64 bit enabled with 1 message
    cap 10[c0] = PCI-Express 2 legacy endpoint max data 128(128) link x1(x1)
                 speed 2.5(2.5) ASPM disabled(L0s/L1)
    ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected
    ecap 0003[130] = Serial 1 ef3856ffffdc9cc8

Please let me know what I could do to help debug this.

I did not back out the change entirely (yet).  I only effectively
backed out the change to the two constants MSK_TX_RING_CNT and
MSK_RX_RING_CNT and that was enough to make the problem go away.

Curtis

> > Curtis
> >  
> >  
> > In message <201401010153.s011rNcm082703 at maildrop2.v6ds.occnc.com>
> > Curtis Villamizar writes:
> > >  
> > > I'm getting an interrupt storm from mskc running with the latest
> > > if_msk.c code.  The OS is built from source (259540):
> > >  
> > > FreeBSD 10.0-PRERELEASE (GENERIC) #0 r259540: Sat Dec 21 00:05:39 EST 2013
> > >  
> > > While not the latest, the point is that sys/dev/msk is up to date wrt
> > > stable_9 and also wrt head.
> > >  
> > > The odd thing is that the machine seemed to run fine for a day or two
> > > and then started exhibiting this behaviour and has become useless.
> > >  
> > > This is now highly reproducible (it happens within seconds when trying
> > > to do a long file transfer between two machines with GbE) so if there
> > > is anything I can do to instrument this, please make suggestions.
> > >  
> > > What I know so far is:
> > >  
> > >   1.  When the watchdog occurs, Y2_IS_STAT_BMU is set in the prior
> > >       interrupt mask.
> > >  
> > >   2.  This would put us in from msk_intr into msk_handle_events, with
> > >       msk_handle_events returning 0.
> > >  
> > >   3.  msk_handle_events reads in sc->msk_stat_cons.  The last recorded
> > >       value of sc->msk_stat_cons is alway 1024.
> > >  
> > >   4.  The only way to exit msk_handle_events with sc->msk_stat_cons
> > >       greater than zero yet not do anything is hit the top of loop
> > >       conditional and fall out:
> > >  
> > >       sd = &sc->msk_stat_ring[cons];
> > >       control = le32toh(sd->msk_control);
> > >       if ((control & HW_OWNER) == 0)
> > >           break;
> > >  
> > >   5.  The code after the loop can return zero if the ring buffer
> > >       pointer hasn't moved.  That code is:
> > >  
> > >       sc->msk_stat_cons = cons;
> > >       bus_dmamap_sync(sc->msk_stat_tag, sc->msk_stat_map,
> > >           BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
> > >  
> > >       if (rxput[MSK_PORT_A] > 0)
> > >               msk_rxput(sc->msk_if[MSK_PORT_A]);
> > >       if (rxput[MSK_PORT_B] > 0)
> > >               msk_rxput(sc->msk_if[MSK_PORT_B]);
> > >  
> > >       return (sc->msk_stat_cons != CSR_READ_2(sc, STAT_PUT_IDX));
> > >  
> > >   6.  If the return value is zero, the interrupt isn't cleared.  That
> > >       was suspect.  The code in msk_intr is:
> > >  
> > >       domore = msk_handle_events(sc);
> > >       if ((status & Y2_IS_STAT_BMU) != 0 && domore == 0)
> > >               CSR_WRITE_4(sc, STAT_CTRL, SC_STAT_CLR_IRQ);
> > >  
> > >   7.  This code before the return in msk_handle_events should force
> > >       the clear but doesn't fix anything.
> > >  
> > >       if ((control & HW_OWNER) == 0)
> > >               return;
> > >  
> > > This looks like some sort of fall off the end of a ring buffer type of
> > > problem (since it always points to entry 0x400) but since I haven't
> > > done driver work in ages, that is mostly just a wild guess and I
> > > really have no idea yet at to what is going wrong.
> > >  
> > > Also please keep me on the Cc since I'm not subscribed to the list,
> > > though I will check the archives from time to time.
> > >  
> > > Thanks,
> > >  
> > > Curtis
> > >  
> > >  
> > > reference:
> > > http://lists.freebsd.org/pipermail/freebsd-stable/2013-November/075699.html