process stuck in nfsfsync state

Mon Oct 25 05:26:19 PDT 2004

On Mon, 25 Oct 2004, Joan Picanyol wrote:

> > Is there an response to the request?  If not, that might suggest the
> > server is wedged, not the client.  If you are willing to share the results
> > of a tcpdump -s 1500 -w <whatever> output from a few seconds during the
> > wedge, that would be very useful.
> 
> Available at http://biaix.org/pk/debug/nfs/ These are from just after
> logging in to GNOME until gconfd-2 goes to nfsfsync, and the nfs server
> not responding messages start appearing. 

Comparing the client and server traces, it looks like fragments in the
client-generated writes are being lost.  For example, frame 4175 in the
client trace is a fragmented NFSv3 write over UDP.  The total datagram
size is 8192, but it's broken down into six IP fragments:

Frame	IP offset	Length		Arrived?
4175	0		1480		Yes
4176	1480		1480		Yes
4177	2960		1480		Yes
4178	4440		1480		Yes
4179	5920		1480		No
4180	7400		944		Yes

Without the missing fragments, the datagrams (and hence RPCs) can't be
reassembled, and with 6-fragment datagrams, even fairly low probability
loss for individual packets adds up (or multiplies up!).  So the question
is: where are your fragments going?

Since the fragments all ended up in the BPF trace on the client, we know
that sufficient mbufs could be allocated on that side to build not only
the datagram but the fragment stream, as well as insert it into the
interface queue without an overflow; they could still have been dropped at
a low level in the driver.  Since they don't appear, even corrupted, in
the server trace, we know they either didn't reach the server or were
dropped very early in processing in the driver.  Dropping in the IP stack
would occur after the packet was submitted to BPF.  So if possible, I
might try some of the following:

- Substituting a different switch or hub between the two systems, and
  looking for possible chronic sources of packet loss between them.

- If possible, getting a trace of the packets on an intermediate node to
  see whether the packets were really sent or not.  Maybe on a monitor
  port on the switch, or by inserting a bridging node.  My suspicion is
  either that the sender is dropping them at a low level in the driver,
  perhaps due to a resource leak, or that they're dropped on the way
  through an intermediate node.  Maybe something is particularly sensitive
  to the rapid sequential send of the 6 fragments.

- Perhaps instrumenting the device drivers on the sender and recipient to
  look for possible areas where packet drops are being triggered.

- I think someone already suggested disabling hardware checksumming, but
  if you haven't tried that, it would be worth trying it.

- It would be useful to see if less complicated NFS meta-transactions than
  "Start GTK" can trigger the problem.  For example, doing a large dd to a
  file in NFS, varying the blocksize to see if you can find useful
  thresholds that trigger the problem.  I see a lot of successful 512 byte
  writes in the trace, but larger datagram sizes of 8192 for writes seem
  to have problems.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert at fledge.watson.org      Principal Research Scientist, McAfee Research