process stuck in nfsfsync state
Robert Watson
rwatson at freebsd.org
Mon Oct 25 05:26:19 PDT 2004
On Mon, 25 Oct 2004, Joan Picanyol wrote:
> > Is there an response to the request? If not, that might suggest the
> > server is wedged, not the client. If you are willing to share the results
> > of a tcpdump -s 1500 -w <whatever> output from a few seconds during the
> > wedge, that would be very useful.
>
> Available at http://biaix.org/pk/debug/nfs/ These are from just after
> logging in to GNOME until gconfd-2 goes to nfsfsync, and the nfs server
> not responding messages start appearing.
Comparing the client and server traces, it looks like fragments in the
client-generated writes are being lost. For example, frame 4175 in the
client trace is a fragmented NFSv3 write over UDP. The total datagram
size is 8192, but it's broken down into six IP fragments:
Frame IP offset Length Arrived?
4175 0 1480 Yes
4176 1480 1480 Yes
4177 2960 1480 Yes
4178 4440 1480 Yes
4179 5920 1480 No
4180 7400 944 Yes
Without the missing fragments, the datagrams (and hence RPCs) can't be
reassembled, and with 6-fragment datagrams, even fairly low probability
loss for individual packets adds up (or multiplies up!). So the question
is: where are your fragments going?
Since the fragments all ended up in the BPF trace on the client, we know
that sufficient mbufs could be allocated on that side to build not only
the datagram but the fragment stream, as well as insert it into the
interface queue without an overflow; they could still have been dropped at
a low level in the driver. Since they don't appear, even corrupted, in
the server trace, we know they either didn't reach the server or were
dropped very early in processing in the driver. Dropping in the IP stack
would occur after the packet was submitted to BPF. So if possible, I
might try some of the following:
- Substituting a different switch or hub between the two systems, and
looking for possible chronic sources of packet loss between them.
- If possible, getting a trace of the packets on an intermediate node to
see whether the packets were really sent or not. Maybe on a monitor
port on the switch, or by inserting a bridging node. My suspicion is
either that the sender is dropping them at a low level in the driver,
perhaps due to a resource leak, or that they're dropped on the way
through an intermediate node. Maybe something is particularly sensitive
to the rapid sequential send of the 6 fragments.
- Perhaps instrumenting the device drivers on the sender and recipient to
look for possible areas where packet drops are being triggered.
- I think someone already suggested disabling hardware checksumming, but
if you haven't tried that, it would be worth trying it.
- It would be useful to see if less complicated NFS meta-transactions than
"Start GTK" can trigger the problem. For example, doing a large dd to a
file in NFS, varying the blocksize to see if you can find useful
thresholds that trigger the problem. I see a lot of successful 512 byte
writes in the trace, but larger datagram sizes of 8192 for writes seem
to have problems.
Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
robert at fledge.watson.org Principal Research Scientist, McAfee Research
More information about the freebsd-stable
mailing list