RELENG_7 em problems (and RELENG_8)

Jack Vogel jfvogel at gmail.com
Fri Sep 24 22:36:11 UTC 2010


There is a new revision of the em driver coming next week, its going thru
some
stress pounding over the weekend, if no issues show up I'll put it into
HEAD.

Yongari's changes in TX context handling which effects checksum and tso
are added. I've also decided that multiple queues in 82574 just are a source
of problems without a lot of benefit, so it still uses MSIX but with only 3
vectors,
meaning it seperates TX and RX but has a single queue.

Its looking very stable, I hope it fixes everyone's issues.

Jack


On Tue, Sep 14, 2010 at 10:59 AM, Mike Tancsa <mike at sentex.net> wrote:

> Hi Jack,
>        Any plans to commit the patch below ? I have been running it on a
> number of boxes and it works as expected with no side effects.
>
>        ---Mike
>
>
>
> At 04:00 PM 8/17/2010, Pyun YongHyeon wrote:
>
>> On Tue, Aug 17, 2010 at 03:55:12PM -0400, Mike Tancsa wrote:
>> > At 02:52 PM 8/17/2010, Pyun YongHyeon wrote:
>> >
>> > >Here is updated patch for HEAD and stable/8.
>> > >http://people.freebsd.org/~yongari/em.csum_tso.20100817.patch<http://people.freebsd.org/%7Eyongari/em.csum_tso.20100817.patch>
>> > >
>> > >It seems to work as expected under my limited environments. If
>> >
>> > Thanks! The patch applies cleanly and all works as expected now! I am
>> > no longer able to trigger the bug. I just use the stock unmodified
>> > driver normally, so no multi queues
>> >
>>
>> Glad to hear that. Thanks for testing!
>>
>> > # vmstat -i
>> > interrupt                          total       rate
>> > irq256: em0                          149          0
>> > irq257: em1                            3          0
>> > irq259: em3                          971          2
>> > irq260: ahci0                       1520          3
>> >
>> >
>> >
>> > em3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu
>> 1500
>> >
>> options=219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC>
>> >         ether 00:15:17:xx:xx:xx
>> >         inet6 fe80::215:17ff:fexx:xxxx%em3 prefixlen 64 scopeid 0x4
>> >         inet 192.168.xx.xx netmask 0xffffff00 broadcast 192.168.xx.xx
>> >         nd6 options=3<PERFORMNUD,ACCEPT_RTADV>
>> >         media: Ethernet autoselect (100baseTX <full-duplex>)
>> >         status: active
>> >
>> >
>> > em3 at pci0:3:0:0: class=0x020000 card=0x34ec8086 chip=0x10d38086
>> > rev=0x00 hdr=0x00
>> >     vendor     = 'Intel Corporation'
>> >     device     = 'Intel 82574L Gigabit Ethernet Controller (82574L)'
>> >     class      = network
>> >     subclass   = ethernet
>> >     cap 01[c8] = powerspec 2  supports D0 D3  current D0
>> >     cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
>> >     cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
>> >     cap 11[a0] = MSI-X supports 5 messages in map 0x1c
>> >
>> >
>> >
>> > patch < em.csum_tso.20100817.patch
>> > Hmm...  Looks like a unified diff to me...
>> > The text leading up to this was:
>> > --------------------------
>> > |Index: sys/dev/e1000/if_em.c
>> > |===================================================================
>> > |--- sys/dev/e1000/if_em.c      (revision 211398)
>> > |+++ sys/dev/e1000/if_em.c      (working copy)
>> > --------------------------
>> > Patching file sys/dev/e1000/if_em.c using Plan A...
>> > Hunk #1 succeeded at 237.
>> > Hunk #2 succeeded at 1730.
>> > Hunk #3 succeeded at 1759.
>> > Hunk #4 succeeded at 1930.
>> > Hunk #5 succeeded at 3148.
>> > Hunk #6 succeeded at 3351.
>> > Hunk #7 succeeded at 3533.
>> > Hunk #8 succeeded at 3590.
>> > Hunk #9 succeeded at 3603.
>> > Hmm...  The next patch looks like a unified diff to me...
>> > The text leading up to this was:
>> > --------------------------
>> > |Index: sys/dev/e1000/if_em.h
>> > |===================================================================
>> > |--- sys/dev/e1000/if_em.h      (revision 211398)
>> > |+++ sys/dev/e1000/if_em.h      (working copy)
>> > --------------------------
>> > Patching file sys/dev/e1000/if_em.h using Plan A...
>> > Hunk #1 succeeded at 284.
>> > done
>> >
>> >         ---Mike
>> >
>> >
>> > >you're using multiple Tx queues with em(4) it would be better to
>> > >disable Tx checksum offloading as driver always have to create a
>> > >new checksum context for each frame. This will effectively disable
>> > >pipelined Tx data DMA which in turn greatly slows down Tx
>> > >performance for small sized frames. The reason driver have to
>> > >create a new checksum context when it uses multiple Tx queues comes
>> > >from hardware limitation. The controller tracks only for the last
>> > >context descriptor that was written such that driver does not know
>> > >the state of checksum context configured in other Tx queue.
>> > >Hope this helps.
>> > >
>> > >>
>> > >>
>> > >>         ---Mike
>> > >>
>> > >>
>> > >> At 03:36 PM 7/2/2010, Pyun YongHyeon wrote:
>> > >> >On Fri, Jul 02, 2010 at 01:39:22PM -0400, Mike Tancsa wrote:
>> > >> >> Hi Jack,
>> > >> >>         Just a followup to the email below. I now saw what appears
>> > >> >> to be the same problem on RELENG_8, but on a different nic and
>> with
>> > >> >> VLANs.  So not sure if this is a general em problem, a problem
>> > >> >> specific to some em NICs, or a TSO problem in general.  The issue
>> > >> >> seemed to be triggered when I added a new vlan based on
>> > >> >>
>> > >> >> em3 at pci0:14:0:0:        class=0x020000 card=0x109a15d9
>> > >> >> chip=0x109a8086 rev=0x00 hdr=0x00
>> > >> >>     vendor     = 'Intel Corporation'
>> > >> >>     device     = 'Intel PRO/1000 PL Network Adaptor (82573L)'
>> > >> >>     class      = network
>> > >> >>     subclass   = ethernet
>> > >> >>     cap 01[c8] = powerspec 2  supports D0 D3  current D0
>> > >> >>     cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1
>> message
>> > >> >>     cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link
>> x1(x1)
>> > >> >>
>> > >> >> pci14: <ACPI PCI bus> on pcib5
>> > >> >> em3: <Intel(R) PRO/1000 Network Connection 7.0.5> port
>> 0x6000-0x601f
>> > >> >> mem 0xe8300000-0xe831ffff irq 17 at device 0.0 on pci14
>> > >> >> em3: Using MSI interrupt
>> > >> >> em3: [FILTER]
>> > >> >> em3: Ethernet address: 00:30:48:9f:eb:81
>> > >> >>
>> > >> >> em3: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST>
>> > >> >> metric 0 mtu 1500
>> > >> >>
>> options=2098<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC>
>> > >> >>         ether 00:30:48:9f:eb:81
>> > >> >>         inet 10.255.255.254 netmask 0xfffffffc broadcast
>> > >10.255.255.255
>> > >> >>         media: Ethernet autoselect (1000baseT <full-duplex>)
>> > >> >>         status: active
>> > >> >>
>> > >> >> I had to disable tso, rxcsum and txsum in order to see the devices
>> on
>> > >> >> the other side of the two vlans trunked off em3.  Unfortunately,
>> the
>> > >> >> other sides were switches 100km and 500km away so I didnt have any
>> > >> >> tcpdump capabilities to diagnose the issue.  I had already created
>> > >> >> one vlan off this NIC and all was fine.  A few weeks later, I
>> added a
>> > >> >> new one and I could no longer telnet into the remote switches from
>> > >> >> the local machine.... But, I could telnet into the switches from
>> > >> >> machines not on the problem box. Hence, it would appear to be a
>> > >> >> general TSO issue no ? I disabled tso on the nic (I didnt disable
>> > >> >> net.inet.tcp.tso as I forgot about that).. Still nothing. I could
>> > >> >> always ping the remote devices, but no tcp services.  I then
>> > >> >> remembered this issue from before, so I tried disabling tso on the
>> > >> >> NIC. Still nothing. Then I disabled rxcsum and txcsum and I could
>> > >> >> then telnet into the remote devices.
>> > >> >>
>> > >> >> This newly observed issue was from a buildworld on Mon Jun 14
>> > >> >> 11:29:12 EDT 2010.
>> > >> >>
>> > >> >> I will try and recreate the issue locally again to see if I can
>> > >> >> trigger the problem on demand.  Any thoughts on what it might be ?
>> > >> >> Perhaps an issue specific to certain em nics ?
>> > >> >>
>> > >> >
>> > >> >http://www.freebsd.org/cgi/query-pr.cgi?pr=141843
>> > >> >I'm not sure whether you're seeing the same issue though.
>> > >> >I didn't have chance to try latest em(4) on stable/7.
>> > >>
>> > >> --------------------------------------------------------------------
>> > >> Mike Tancsa,                                      tel +1 519 651 3400
>> > >> Sentex Communications,                            mike at sentex.net
>> > >> Providing Internet since 1994                    www.sentex.net
>> > >> Cambridge, Ontario Canada
>> www.sentex.net/mike
>> > >>
>> >
>> > --------------------------------------------------------------------
>> > Mike Tancsa,                                      tel +1 519 651 3400
>> > Sentex Communications,                            mike at sentex.net
>> > Providing Internet since 1994                    www.sentex.net
>> > Cambridge, Ontario Canada                         www.sentex.net/mike
>> >
>>
>
> --------------------------------------------------------------------
> Mike Tancsa,                                      tel +1 519 651 3400
> Sentex Communications,                            mike at sentex.net
> Providing Internet since 1994                    www.sentex.net
> Cambridge, Ontario Canada                         www.sentex.net/mike
>
>


More information about the freebsd-stable mailing list