svn commit: r267935 - head/sys/dev/e1000 (with work around?)

Fri Sep 12 21:00:35 UTC 2014

Mike Tansca wrote:
> On 9/12/2014 10:09 AM, Mike Tancsa wrote:
> >
> > FYI, I just ran into this bug on another box, with an onboard em
> > nic, so
> > I dont think its a one off hardware issue. AMD64,  FreeBSD
> > 10.0-STABLE
> > #4 r270560:
> > This is on an Intel MB S1200BTL (
> > S1200BT.86B.02.00.0035.030220120927)
> >
> > Unfortunately, this is also a production box so its difficult to
> > test. I
> > am going to see if I can find a similar MB to test against.
> 
> 
> I found another board I can test with. It takes a bit of random
> traffic
> to wedge, but I can lock up the NIC to the point where I have to down
> and up it
> 
> When the NIC is wedged, sending sysctl -w em.1.debug=1 shows
> 
> 
> Sep 12 11:05:05 backup3 kernel: Interface is RUNNING and ACTIVE
> Sep 12 11:05:05 backup3 kernel: em1: hw tdh = 414, hw tdt = 980
> Sep 12 11:05:05 backup3 kernel: em1: hw rdh = 768, hw rdt = 767
> Sep 12 11:05:05 backup3 kernel: em1: Tx Queue Status = 1
> Sep 12 11:05:05 backup3 kernel: em1: TX descriptors avail = 449
> Sep 12 11:05:05 backup3 kernel: em1: Tx Descriptors avail failure = 3
> Sep 12 11:05:05 backup3 kernel: em1: RX discarded packets = 0
> Sep 12 11:05:05 backup3 kernel: em1: RX Next to Check = 768
> Sep 12 11:05:05 backup3 kernel: em1: RX Next to Refresh = 767
> 
> em1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu
> 1500
>  
> options=4219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC,VLAN_HWTSO>
>          ether 00:15:17:ed:68:a4
>          inet 1.1.1.2 netmask 0xffffff00 broadcast 1.1.1.255
>          nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>          media: Ethernet autoselect (1000baseT <full-duplex>)
>          status: active
> 
> The network traffic involves sending a lot of traffic via NFS.  I
> found
> that if I disable TSO on the nic, it seems to fix the problem, or at
> least makes its hard to reproduce. With tso enabled, it took perhaps
> 30-120 seconds for the problem to manifest.  Both on my test and
> production box, I have not run into the problem in the past 45min.
> 
To get NFS (which generates 35 mbuf lists for transmission) to work
with TSO enabled, you need both r264630 (which reduces the default
size of if_hw_tsomax slightly) and r268726 (which fixes the retry: case
for if_em.c). Neither of these patches (the above rNNN are for head) are
in 10.0.

So, either run with TSO disabled or reduce the rsize, wsize of all NFS
mounts to 32768 (which reduces the # of mbufs in a transmit list to 19).

rick
ps: For 10.0, 9.2 and earlier, you need to disable TSO for any network
    interface that supports less than 35 transmit fragments. (Transmit
    fragments are variously called *TXSEG*, or *SCAT* or similar in the
    .h file for the device driver.)

> 	---Mike
> 
> 
> 
> 
> --
> -------------------
> Mike Tancsa, tel +1 519 651 3400
> Sentex Communications, mike at sentex.net
> Providing Internet services since 1994 www.sentex.net
> Cambridge, Ontario Canada   http://www.tancsa.com/
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to
> "freebsd-stable-unsubscribe at freebsd.org"
>