FreeBSD NFS client goes into infinite retry loop

Fri Mar 19 11:34:24 UTC 2010

Hi, we use a FreeBSD 8-STABLE (from shortly after release) system as an 
NFS server to provide user home directories which get mounted across a 
few machines (all 6.3-RELEASE).  For the past few weeks we have been 
running into problems where one particular client will go into an 
infinite loop where it is repeatedly trying to write data which causes 
the NFS server to return "reply ok 40 write ERROR: Input/output error 
PRE: POST:".  This retry loop can cause between 20mbps and 500mbps of 
constant traffic on our network, depending on the size of the data 
associated with the failed write.

We spent some time on the issue and determined that something on one of 
the clients is deleting a file as it is being written to by another NFS 
client.  We were able to enable the NFS lockmgr and use lockf(1) to fix 
most of these conditions, and the frequency of this problem has dropped 
from once a night to once a week.  However, it's still a problem and we 
can't necessarily force all of our users to "play nice" and use lockf/flock.

Has anyone seen this before?  No errors are being logged on the NFS 
server itself, but the "Server Ret-Failed" counter begins to increase 
rapidly whenever a client gets stuck in this infinite retry loop:
Server Ret-Failed
         224768961

I have a feeling that using NFS in such a matter may simply be prone to 
such problems, but what confuses me is why the NFS client system is 
infinitely retrying the write operation and causing itself so much grief.

Thanks for any suggestions anyone can provide,
Steve Polyack