NFS stalling on 8.1-STABLE

Rick Macklem rmacklem at uoguelph.ca
Sun Aug 15 21:11:02 UTC 2010


> Hi all,
> 
> I have five front end web servers that all mount their content from the same server via NFS.  If I stress the link on any one of the machines (eg: copy a large directory with a lot of files to/from the mounted file system) the client will pause.  That is, all processes trying to access that mount will freeze.  The log files with hundreds or thousands of nfs server not responding / is alive again messages. After 60 seconds it returns to normal, unless the load is still there in which case it continues to pause.
> 

The 60sec delay suggests that the client is doing a TCP reconnect. I'd suggest that you
look at a packet trace in wireshark (it knows how to decode NFS packets) and see if
there are new TCP connections (SYN, SYN-ACK,...) being made. If that is what is
happening, I suspect it is NIC driver related, but it is really hard to say.

If you can try a network interface of a different type (not em) that will check to
see if it is an em(4) issue.

Alternately, you could try turning off the TSO and checksum offload stuff for the
em(4) and see if that helps.

> This has only started happening since I upgraded the client machines to 8.1-STABLE (previously four of them were 8.0 and one was 7.3).  The server is 7.1-RELEASE-p11.  No other changes have taken place in terms of hardware or software or mount options, etc.
> 

There were some client side fixes between 8.0 and 8.1, but I don't think any
of those have caused a regression w.r.t. connections. There is a problem w.r.t.
the nfsd getting in a loop, but that wouldn't recover after 60sec. (If it happens,
the server has to be rebooted. There is a fix for this at:
   http://people.freebsd.org/~rmacklem/freebsd8.1-patches/replay.patch
but I don't think it is what you are seeing.)

rick


More information about the freebsd-stable mailing list