RELENG_7 em problems (and RELENG_8)

Mike Tancsa mike at sentex.net
Tue Aug 17 19:55:13 UTC 2010


At 02:52 PM 8/17/2010, Pyun YongHyeon wrote:

>Here is updated patch for HEAD and stable/8.
>http://people.freebsd.org/~yongari/em.csum_tso.20100817.patch
>
>It seems to work as expected under my limited environments. If

Thanks! The patch applies cleanly and all works as expected now! I am 
no longer able to trigger the bug. I just use the stock unmodified 
driver normally, so no multi queues

# vmstat -i
interrupt                          total       rate
irq256: em0                          149          0
irq257: em1                            3          0
irq259: em3                          971          2
irq260: ahci0                       1520          3



em3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
         options=219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC>
         ether 00:15:17:xx:xx:xx
         inet6 fe80::215:17ff:fexx:xxxx%em3 prefixlen 64 scopeid 0x4
         inet 192.168.xx.xx netmask 0xffffff00 broadcast 192.168.xx.xx
         nd6 options=3<PERFORMNUD,ACCEPT_RTADV>
         media: Ethernet autoselect (100baseTX <full-duplex>)
         status: active


em3 at pci0:3:0:0: class=0x020000 card=0x34ec8086 chip=0x10d38086 
rev=0x00 hdr=0x00
     vendor     = 'Intel Corporation'
     device     = 'Intel 82574L Gigabit Ethernet Controller (82574L)'
     class      = network
     subclass   = ethernet
     cap 01[c8] = powerspec 2  supports D0 D3  current D0
     cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
     cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
     cap 11[a0] = MSI-X supports 5 messages in map 0x1c



patch < em.csum_tso.20100817.patch
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|Index: sys/dev/e1000/if_em.c
|===================================================================
|--- sys/dev/e1000/if_em.c      (revision 211398)
|+++ sys/dev/e1000/if_em.c      (working copy)
--------------------------
Patching file sys/dev/e1000/if_em.c using Plan A...
Hunk #1 succeeded at 237.
Hunk #2 succeeded at 1730.
Hunk #3 succeeded at 1759.
Hunk #4 succeeded at 1930.
Hunk #5 succeeded at 3148.
Hunk #6 succeeded at 3351.
Hunk #7 succeeded at 3533.
Hunk #8 succeeded at 3590.
Hunk #9 succeeded at 3603.
Hmm...  The next patch looks like a unified diff to me...
The text leading up to this was:
--------------------------
|Index: sys/dev/e1000/if_em.h
|===================================================================
|--- sys/dev/e1000/if_em.h      (revision 211398)
|+++ sys/dev/e1000/if_em.h      (working copy)
--------------------------
Patching file sys/dev/e1000/if_em.h using Plan A...
Hunk #1 succeeded at 284.
done

         ---Mike


>you're using multiple Tx queues with em(4) it would be better to
>disable Tx checksum offloading as driver always have to create a
>new checksum context for each frame. This will effectively disable
>pipelined Tx data DMA which in turn greatly slows down Tx
>performance for small sized frames. The reason driver have to
>create a new checksum context when it uses multiple Tx queues comes
>from hardware limitation. The controller tracks only for the last
>context descriptor that was written such that driver does not know
>the state of checksum context configured in other Tx queue.
>Hope this helps.
>
> >
> >
> >         ---Mike
> >
> >
> > At 03:36 PM 7/2/2010, Pyun YongHyeon wrote:
> > >On Fri, Jul 02, 2010 at 01:39:22PM -0400, Mike Tancsa wrote:
> > >> Hi Jack,
> > >>         Just a followup to the email below. I now saw what appears
> > >> to be the same problem on RELENG_8, but on a different nic and with
> > >> VLANs.  So not sure if this is a general em problem, a problem
> > >> specific to some em NICs, or a TSO problem in general.  The issue
> > >> seemed to be triggered when I added a new vlan based on
> > >>
> > >> em3 at pci0:14:0:0:        class=0x020000 card=0x109a15d9
> > >> chip=0x109a8086 rev=0x00 hdr=0x00
> > >>     vendor     = 'Intel Corporation'
> > >>     device     = 'Intel PRO/1000 PL Network Adaptor (82573L)'
> > >>     class      = network
> > >>     subclass   = ethernet
> > >>     cap 01[c8] = powerspec 2  supports D0 D3  current D0
> > >>     cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
> > >>     cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
> > >>
> > >> pci14: <ACPI PCI bus> on pcib5
> > >> em3: <Intel(R) PRO/1000 Network Connection 7.0.5> port 0x6000-0x601f
> > >> mem 0xe8300000-0xe831ffff irq 17 at device 0.0 on pci14
> > >> em3: Using MSI interrupt
> > >> em3: [FILTER]
> > >> em3: Ethernet address: 00:30:48:9f:eb:81
> > >>
> > >> em3: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST>
> > >> metric 0 mtu 1500
> > >>         options=2098<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC>
> > >>         ether 00:30:48:9f:eb:81
> > >>         inet 10.255.255.254 netmask 0xfffffffc broadcast 10.255.255.255
> > >>         media: Ethernet autoselect (1000baseT <full-duplex>)
> > >>         status: active
> > >>
> > >> I had to disable tso, rxcsum and txsum in order to see the devices on
> > >> the other side of the two vlans trunked off em3.  Unfortunately, the
> > >> other sides were switches 100km and 500km away so I didnt have any
> > >> tcpdump capabilities to diagnose the issue.  I had already created
> > >> one vlan off this NIC and all was fine.  A few weeks later, I added a
> > >> new one and I could no longer telnet into the remote switches from
> > >> the local machine.... But, I could telnet into the switches from
> > >> machines not on the problem box. Hence, it would appear to be a
> > >> general TSO issue no ? I disabled tso on the nic (I didnt disable
> > >> net.inet.tcp.tso as I forgot about that).. Still nothing. I could
> > >> always ping the remote devices, but no tcp services.  I then
> > >> remembered this issue from before, so I tried disabling tso on the
> > >> NIC. Still nothing. Then I disabled rxcsum and txcsum and I could
> > >> then telnet into the remote devices.
> > >>
> > >> This newly observed issue was from a buildworld on Mon Jun 14
> > >> 11:29:12 EDT 2010.
> > >>
> > >> I will try and recreate the issue locally again to see if I can
> > >> trigger the problem on demand.  Any thoughts on what it might be ?
> > >> Perhaps an issue specific to certain em nics ?
> > >>
> > >
> > >http://www.freebsd.org/cgi/query-pr.cgi?pr=141843
> > >I'm not sure whether you're seeing the same issue though.
> > >I didn't have chance to try latest em(4) on stable/7.
> >
> > --------------------------------------------------------------------
> > Mike Tancsa,                                      tel +1 519 651 3400
> > Sentex Communications,                            mike at sentex.net
> > Providing Internet since 1994                    www.sentex.net
> > Cambridge, Ontario Canada                         www.sentex.net/mike
> >

--------------------------------------------------------------------
Mike Tancsa,                                      tel +1 519 651 3400
Sentex Communications,                            mike at sentex.net
Providing Internet since 1994                    www.sentex.net
Cambridge, Ontario Canada                         www.sentex.net/mike



More information about the freebsd-stable mailing list