NFS problems, locking up

Wed Jan 12 12:57:14 PST 2005

On Wed, 12 Jan 2005, Daniel Eriksson wrote:

> I'm still having problems with NFS locking up when moving large amounts
> of data over it on 6-CURRENT from 2005.01.11.05.00.00. This problem has
> persisted for a long time now, and the only thing that seems to cure it
> is running the network stack with giant enabled (debug.mpsafenet=0). 
> 
> When it happens, the process doing the copying ends up in "nfsaio" state
> according to ps. Any accesses to the locked mount by other processes
> ends up waiting forever in state "nfs". I have multiple file systems
> mounted from the same server, and only the mount where the data is being
> moved locks up.  The others continue to work as expected. 
> 
> Server: UP, 6-CURRENT from 2005.01.11.05.00.00, if_vr (POLLING)  Client:
> SMP (dual AMD MP), 6-CURRENT from 2005.01.11.05.00.00, if_em
> 
> The machines are connected with a crossover cable. I've tried both
> schedulers (4BSD and ULE) on the client, but it doesn't make any
> difference (server is running 4BSD). PREEMPTION is enabled on both
> server and client.  ADAPTIVE_GIANT is enabled on the client. 

If you run with INVARIANTS and WITNESS, does anything useful get printed
out?  Does it make a difference if you run the client with a UP kernel?

If you break to the debugger on the client and server once wedging has
occurred, what does "show lockedvnods" and "show alllocks" show?

Is there any chance you could attach a second NFS client to the
configuration, wedge the file system from the first client, and then try
the second client and see if it experiences immediate problems?

Robert N M Watson