em driver regression
Mike Tancsa
mike at sentex.net
Fri Apr 9 13:17:16 UTC 2010
At 07:07 PM 4/8/2010, Pyun YongHyeon wrote:
>On Thu, Apr 08, 2010 at 02:06:09PM -0700, Jack Vogel wrote:
> > Only one device support by em does multiqueue right now, and that is
> > Hartwell, 82574.
> >
>
>Thanks for the info.
>
>Mike, here is updated patch. Now UDP bulk TX transfer performance
>recovered a lot(about 890Mbps) but it still shows bad numbers
>compared to other controllers. For example, bce(4) shows about
>958Mbps for the same load.
>During the testing I found a strong indication of packet reordering
>issue of drbr interface. If I forcibly change to use single TX
>queue, em(4) got 950Mbps as it used to be.
>
>Jack, as we talked about possible drbr issue with igb(4), UDP
>transfer seems to suffer from packet reordering issue here. Can we
>make em(4)/igb(4) use single TX queue until we solve drbr interface
>issue? Given that only one em(4) controller supports multiqueue,
>dropping multiqueue support for em(4) does not look bad to me.
No watchdog errors over night. I wonder if the issue was due to
100Mb, or the patch from current fixed it. I will try today with the
new patch below! I am guessing the rejection was due to the RX/TX fix ?
---Mike
Hmm... Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|Index: sys/dev/e1000/if_em.c
|===================================================================
|--- sys/dev/e1000/if_em.c (revision 206403)
|+++ sys/dev/e1000/if_em.c (working copy)
--------------------------
Patching file if_em.c using Plan A...
Hunk #1 succeeded at 812 with fuzz 2.
Hunk #2 succeeded at 834 (offset -4 lines).
Hunk #3 succeeded at 869 (offset -4 lines).
Hunk #4 succeeded at 913 (offset -4 lines).
Hunk #5 succeeded at 941 (offset -4 lines).
Hunk #6 succeeded at 1439 (offset -4 lines).
Hunk #7 succeeded at 1452 (offset -4 lines).
Hunk #8 succeeded at 1472 (offset -4 lines).
Hunk #9 succeeded at 1532 (offset -4 lines).
Hunk #10 succeeded at 1549 (offset -4 lines).
Hunk #11 failed at 1909.
Hunk #12 succeeded at 3617 (offset 2 lines).
Hunk #13 succeeded at 4069 (offset -6 lines).
Hunk #14 succeeded at 4087 (offset 2 lines).
Hunk #15 succeeded at 4187 (offset -6 lines).
1 out of 15 hunks failed--saving rejects to if_em.c.rej
Hmm... The next patch looks like a unified diff to me...
The text leading up to this was:
--------------------------
|Index: sys/dev/e1000/if_em.h
|===================================================================
|--- sys/dev/e1000/if_em.h (revision 206403)
|+++ sys/dev/e1000/if_em.h (working copy)
--------------------------
Patching file if_em.h using Plan A...
Hunk #1 succeeded at 223.
done
1(ich10)# less if_em.c.rej
***************
*** 1908,1919 ****
bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
E1000_WRITE_REG(&adapter->hw, E1000_TDT(txr->me), i);
- txr->watchdog_time = ticks;
- /* Call cleanup if number of TX descriptors low */
- if (txr->tx_avail <= EM_TX_CLEANUP_THRESHOLD)
- em_txeof(txr);
-
return (0);
}
--- 1909,1915 ----
bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
E1000_WRITE_REG(&adapter->hw, E1000_TDT(txr->me), i);
return (0);
}
0(ich10)#
> > Jack
> >
> >
> > On Thu, Apr 8, 2010 at 2:05 PM, Mike Tancsa <mike at sentex.net> wrote:
> >
> > > At 04:56 PM 4/8/2010, Pyun YongHyeon wrote:
> > >
> > >> On Thu, Apr 08, 2010 at 02:31:18PM -0400, Mike Tancsa wrote:
> > >> > At 02:17 PM 4/8/2010, Pyun YongHyeon wrote:
> > >> >
> > >> > >Try this patch. It should fix the issue. It seems Jack forgot to
> > >> > >strip CRC bytes as old em(4) didn't strip it, probably to
> > >> > >workaround silicon bug of old em(4) controllers.
> > >> >
> > >> > Thanks! The attached patch does indeed fix the dhclient issue.
> > >> >
> > >> >
> > >> > >It seems there are also TX issues here. The system load is too high
> > >> > >and sometimes system is not responsive while TX is in progress.
> > >> > >Because I initiated TCP bulk transfers, TSO should reduce CPU load
> > >> > >a lot but it didn't so I guess it could also be related watchdog
> > >> > >timeouts you've seen. I'll see what can be done.
> > >> >
> > >> > Thanks for looking into that as well!!
> > >> >
> > >> > ---Mike
> > >> >
> > >>
> > >> Mike,
> > >>
> > >> Here is patch I'm working on. This patch fixes high system load and
> > >> system is very responsive as before. But it seems there is still
> > >> some TX issue here. Bulk UDP performance is very poor(< 700Mbps)
> > >> and I have no idea what caused this at this moment.
> > >>
> > >> BTW, I have trouble to reproduce watchdog timeouts. I'm not sure
> > >> whether latest fix from Jack cured it. By chance does your
> > >> controller support multi TX/RX queues? You can check whether em(4)
> > >> uses multi-queues with "vmstat -i". If em(4) use multi-queue you
> > >> may have multiple irq output for em0.
> > >>
> > >
> > > Hi,
> > > I will give it a try later tonight! This one does not seem to.
> > >
> > > 0(ich10)# vmstat -i
> > > interrupt total rate
> > > irq16: uhci0+ 30 0
> > > irq18: ehci0 uhci5 158419 17
> > > irq19: fwohci0++ 86 0
> > > irq21: uhci1 17 0
> > > irq23: uhci3 ehci1 2 0
> > > cpu0: timer 18570305 1994
> > > irq256: igb0 80 0
> > > irq257: igb0 255 0
> > > irq258: igb0 66 0
> > > irq259: igb0 32 0
> > > irq260: igb0 2 0
> > > irq261: igb1 2679 0
> > > irq262: igb1 998 0
> > > irq263: igb1 2468 0
> > > irq264: igb1 6361 0
> > > irq265: igb1 2 0
> > > irq266: em0 33910 3
> > > irq267: ahci1 15317 1
> > > cpu1: timer 18557074 1993
> > > cpu3: timer 18557168 1993
> > > cpu2: timer 18557108 1993
> > > Total 74462379 7998
> > > 0(ich10)#
> > >
>
--------------------------------------------------------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mike at sentex.net
Providing Internet since 1994 www.sentex.net
Cambridge, Ontario Canada www.sentex.net/mike
More information about the freebsd-stable
mailing list