kern/140326: em0: watchdog timeout when communicating to windows using 9K MTU

Jack Vogel jfvogel at gmail.com
Fri Nov 6 02:00:13 UTC 2009


The following reply was made to PR kern/140326; it has been noted by GNATS.

From: Jack Vogel <jfvogel at gmail.com>
To: Maksym Sobolyev <sobomax at freebsd.org>
Cc: freebsd-gnats-submit at freebsd.org
Subject: Re: kern/140326: em0: watchdog timeout when communicating to windows 
	using 9K MTU
Date: Thu, 5 Nov 2009 17:28:50 -0800

 --0016e6d99d6125581f0477a9c469
 Content-Type: text/plain; charset=ISO-8859-1
 
 Can't do much unless you adequately identify hardware, on BOTH sides,
 believe
 it or not "windows" is not a sufficient description :)
 
 I need to know what the E1000 hardware is, using pciconf -l, and I also need
 to
 know what is on the Windows side before having a clue on how to repro or
 help
 you.
 
 Cheers,
 
 Jack
 
 
 On Thu, Nov 5, 2009 at 5:18 PM, Maksym Sobolyev <sobomax at freebsd.org> wrote:
 
 >
 > >Number:         140326
 > >Category:       kern
 > >Synopsis:       em0: watchdog timeout when communicating to windows using
 > 9K MTU
 > >Confidential:   no
 > >Severity:       serious
 > >Priority:       high
 > >Responsible:    freebsd-bugs
 > >State:          open
 > >Quarter:
 > >Keywords:
 > >Date-Required:
 > >Class:          sw-bug
 > >Submitter-Id:   current-users
 > >Arrival-Date:   Fri Nov 06 01:20:01 UTC 2009
 > >Closed-Date:
 > >Last-Modified:
 > >Originator:     Maksym Sobolyev
 > >Release:        7.2-p4
 > >Organization:
 > Sippy Software, Inc.
 > >Environment:
 > FreeBSD pioneer.sippysoft.com 7.2-RELEASE-p4 FreeBSD 7.2-RELEASE-p4 #0:
 > Sun Oct  4 03:08:04 PDT 2009     root at pioneer.sippysoft.com:/usr/obj/usr/src/sys/PIONEER
 >  amd64
 > >Description:
 > My em0 interface repeatedly hangs up with watchdog timeout when
 > communicating to the windows host at MTU 9K.
 >
 > [sobomax at pioneer ~]$ grep em0 /var/run/dmesg.boot
 > em0: <Intel(R) PRO/1000 Network Connection 6.9.6> port 0xecc0-0xecdf mem
 > 0xfe6e0000-0xfe6fffff,0xfe6d9000-0xfe6d9fff irq 21 at device 25.0 on pci0
 > em0: Using MSI interrupt
 > em0: [FILTER]
 > em0: Ethernet address: 00:22:19:32:87:2f
 > [sobomax at pioneer ~]$ uname -a
 > FreeBSD pioneer.sippysoft.com 7.2-RELEASE-p4 FreeBSD 7.2-RELEASE-p4 #0:
 > Sun Oct  4 03:08:04 PDT 2009     root at pioneer.sippysoft.com:/usr/obj/usr/src/sys/PIONEER
 >  amd64
 > [sobomax at pioneer ~]$ ifconfig em0
 > em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
 >        options=98<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
 >        ether 00:22:19:32:87:2f
 >        inet 192.168.1.1 netmask 0xffffff00 broadcast 192.168.1.255
 >        inet 192.168.2.1 netmask 0xffffff00 broadcast 192.168.2.255
 >        inet6 fec0::1 prefixlen 64
 >        media: Ethernet autoselect (1000baseTX <full-duplex>)
 >        status: active
 > [sobomax at pioneer ~]$ dmesg | grep watchd
 > em0: watchdog timeout -- resetting
 > em0: watchdog timeout -- resetting
 > em0: watchdog timeout -- resetting
 > em0: watchdog timeout -- resetting
 > em0: watchdog timeout -- resetting
 >
 > I have managed to make a packet capture right at the time when hang
 > happens. It appears to be that either "MAC Pause" or "TCP Segment of
 > reassembled PDU" is the last packet that goes through before the interface
 > hangs.
 >
 > Here is the screenshot, if somebody wants to take closer look at the actual
 > packets please let me know.
 >
 > http://sobomax.sippysoft.com/~sobomax/ScreenShot527.png<http://sobomax.sippysoft.com/%7Esobomax/ScreenShot527.png>
 >
 > Turning off TSO and TXCSUM/RXCSUM has not helped. Bringing MTU down to
 > 1,500 resolved the issue.
 >
 > I have had the same problem happening several times in the past (although I
 > initially attributed it to the bad cable or something like that), so it's
 > definitely not on-off issue.
 >
 > Given popularity of intel/pro chips in today's computers it look like quite
 > serious issue to me. Any help is greatly appreciated.
 > >How-To-Repeat:
 >
 > >Fix:
 >
 >
 > >Release-Note:
 > >Audit-Trail:
 > >Unformatted:
 > _______________________________________________
 > freebsd-bugs at freebsd.org mailing list
 > http://lists.freebsd.org/mailman/listinfo/freebsd-bugs
 > To unsubscribe, send any mail to "freebsd-bugs-unsubscribe at freebsd.org"
 >
 
 --0016e6d99d6125581f0477a9c469
 Content-Type: text/html; charset=ISO-8859-1
 Content-Transfer-Encoding: quoted-printable
 
 Can&#39;t do much unless you adequately identify hardware, on BOTH sides, b=
 elieve<br>it or not &quot;windows&quot; is not a sufficient description :)<=
 br><br>I need to know what the E1000 hardware is, using pciconf -l, and I a=
 lso need to<br>
 know what is on the Windows side before having a clue on how to repro or he=
 lp<br>you.<br><br>Cheers,<br><br>Jack<br><br><br><div class=3D"gmail_quote"=
 >On Thu, Nov 5, 2009 at 5:18 PM, Maksym Sobolyev <span dir=3D"ltr">&lt;<a h=
 ref=3D"mailto:sobomax at freebsd.org">sobomax at freebsd.org</a>&gt;</span> wrote=
 :<br>
 <blockquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, =
 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>
 &gt;Number: =A0 =A0 =A0 =A0 140326<br>
 &gt;Category: =A0 =A0 =A0 kern<br>
 &gt;Synopsis: =A0 =A0 =A0 em0: watchdog timeout when communicating to windo=
 ws using 9K MTU<br>
 &gt;Confidential: =A0 no<br>
 &gt;Severity: =A0 =A0 =A0 serious<br>
 &gt;Priority: =A0 =A0 =A0 high<br>
 &gt;Responsible: =A0 =A0freebsd-bugs<br>
 &gt;State: =A0 =A0 =A0 =A0 =A0open<br>
 &gt;Quarter:<br>
 &gt;Keywords:<br>
 &gt;Date-Required:<br>
 &gt;Class: =A0 =A0 =A0 =A0 =A0sw-bug<br>
 &gt;Submitter-Id: =A0 current-users<br>
 &gt;Arrival-Date: =A0 Fri Nov 06 01:20:01 UTC 2009<br>
 &gt;Closed-Date:<br>
 &gt;Last-Modified:<br>
 &gt;Originator: =A0 =A0 Maksym Sobolyev<br>
 &gt;Release: =A0 =A0 =A0 =A07.2-p4<br>
 &gt;Organization:<br>
 Sippy Software, Inc.<br>
 &gt;Environment:<br>
 FreeBSD <a href=3D"http://pioneer.sippysoft.com" target=3D"_blank">pioneer.=
 sippysoft.com</a> 7.2-RELEASE-p4 FreeBSD 7.2-RELEASE-p4 #0: Sun Oct =A04 03=
 :08:04 PDT 2009 =A0 =A0 root at pioneer.sippysoft.com:/usr/obj/usr/src/sys/PIO=
 NEER =A0amd64<br>
 
 &gt;Description:<br>
 My em0 interface repeatedly hangs up with watchdog timeout when communicati=
 ng to the windows host at MTU 9K.<br>
 <br>
 [sobomax at pioneer ~]$ grep em0 /var/run/dmesg.boot<br>
 em0: &lt;Intel(R) PRO/1000 Network Connection 6.9.6&gt; port 0xecc0-0xecdf =
 mem 0xfe6e0000-0xfe6fffff,0xfe6d9000-0xfe6d9fff irq 21 at device 25.0 on pc=
 i0<br>
 em0: Using MSI interrupt<br>
 em0: [FILTER]<br>
 em0: Ethernet address: 00:22:19:32:87:2f<br>
 [sobomax at pioneer ~]$ uname -a<br>
 FreeBSD <a href=3D"http://pioneer.sippysoft.com" target=3D"_blank">pioneer.=
 sippysoft.com</a> 7.2-RELEASE-p4 FreeBSD 7.2-RELEASE-p4 #0: Sun Oct =A04 03=
 :08:04 PDT 2009 =A0 =A0 root at pioneer.sippysoft.com:/usr/obj/usr/src/sys/PIO=
 NEER =A0amd64<br>
 
 [sobomax at pioneer ~]$ ifconfig em0<br>
 em0: flags=3D8843&lt;UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST&gt; metric 0 mt=
 u 9000<br>
  =A0 =A0 =A0 =A0options=3D98&lt;VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM&gt;<br>
  =A0 =A0 =A0 =A0ether 00:22:19:32:87:2f<br>
  =A0 =A0 =A0 =A0inet 192.168.1.1 netmask 0xffffff00 broadcast 192.168.1.255=
 <br>
  =A0 =A0 =A0 =A0inet 192.168.2.1 netmask 0xffffff00 broadcast 192.168.2.255=
 <br>
  =A0 =A0 =A0 =A0inet6 fec0::1 prefixlen 64<br>
  =A0 =A0 =A0 =A0media: Ethernet autoselect (1000baseTX &lt;full-duplex&gt;)=
 <br>
  =A0 =A0 =A0 =A0status: active<br>
 [sobomax at pioneer ~]$ dmesg | grep watchd<br>
 em0: watchdog timeout -- resetting<br>
 em0: watchdog timeout -- resetting<br>
 em0: watchdog timeout -- resetting<br>
 em0: watchdog timeout -- resetting<br>
 em0: watchdog timeout -- resetting<br>
 <br>
 I have managed to make a packet capture right at the time when hang happens=
 . It appears to be that either &quot;MAC Pause&quot; or &quot;TCP Segment o=
 f reassembled PDU&quot; is the last packet that goes through before the int=
 erface hangs.<br>
 
 <br>
 Here is the screenshot, if somebody wants to take closer look at the actual=
  packets please let me know.<br>
 <br>
 <a href=3D"http://sobomax.sippysoft.com/%7Esobomax/ScreenShot527.png" targe=
 t=3D"_blank">http://sobomax.sippysoft.com/~sobomax/ScreenShot527.png</a><br=
 >
 <br>
 Turning off TSO and TXCSUM/RXCSUM has not helped. Bringing MTU down to 1,50=
 0 resolved the issue.<br>
 <br>
 I have had the same problem happening several times in the past (although I=
  initially attributed it to the bad cable or something like that), so it&#3=
 9;s definitely not on-off issue.<br>
 <br>
 Given popularity of intel/pro chips in today&#39;s computers it look like q=
 uite serious issue to me. Any help is greatly appreciated.<br>
 &gt;How-To-Repeat:<br>
 <br>
 &gt;Fix:<br>
 <br>
 <br>
 &gt;Release-Note:<br>
 &gt;Audit-Trail:<br>
 &gt;Unformatted:<br>
 _______________________________________________<br>
 <a href=3D"mailto:freebsd-bugs at freebsd.org">freebsd-bugs at freebsd.org</a> ma=
 iling list<br>
 <a href=3D"http://lists.freebsd.org/mailman/listinfo/freebsd-bugs" target=
 =3D"_blank">http://lists.freebsd.org/mailman/listinfo/freebsd-bugs</a><br>
 To unsubscribe, send any mail to &quot;<a href=3D"mailto:freebsd-bugs-unsub=
 scribe at freebsd.org">freebsd-bugs-unsubscribe at freebsd.org</a>&quot;<br>
 </blockquote></div><br>
 
 --0016e6d99d6125581f0477a9c469--


More information about the freebsd-bugs mailing list