svn commit: r267935 - head/sys/dev/e1000 (with work around?)

Sat Sep 13 21:20:43 UTC 2014

Mike Tansca wrote:
> On 9/12/2014 7:33 PM, Rick Macklem wrote:
> > I wrote:
> >> The patches are in 10.1. I thought his report said 10.0 in the message.
> >>
> >> If Mike is running a recent stable/10 or releng/10.1, then it has been
> >> patched for this and NFS should work with TSO enabled. If it doesn't,
> >> then something else is broken.
> > Oops, I looked and I see Mike was testing r270560 (which would have both
> > the patches). I don't have an explanation why TSO and 64K rsize, wsize
> > would cause a hang, but does appear it will exist in 10.1 unless it
> > gets resolved.
> >
> > Mike, one difference is that, even with the patches the driver will be
> > copying the transmit mbuf list via m_defrag() to 32 MCLBYTE clusters
> > when using 64K rsize, wsize.
> > If you can reproduce the hang, you might want to look at how many mbuf
> > clusters are allocated. If you've hit the limit, then I think that
> > would explain it.
> 
> I have been running the test for a few hrs now and no lockups of the 
> nic, so doing the nfs mount with -orsize=32768,wsize=32768 certainly 
? seems to work around the lockup.   How do I check the mbuf clusters ?

Btw, in the past when reducing the rsize,wsize has fixed a problem that
isn't fixed by disabling TSO, it has been a problem w.r.t. receiving a
burst of ethernet packets.
I believe this may be a problem with either the receive ring size or
interrupt latency (testers have reported cases where changing the way
the device driver uses interrupts have fixed the problem so that it
worked with 64K rsize, wsize).

I have no familiarity with this hardware/driver so I can't suggest
anything specific to try except maybe how interrupts are handled,
if the driver has a sysctl for that.

rick