NFS Locking Issue

Tue Jul 4 13:40:35 UTC 2006

On Mon, Jul 03, 2006 at 03:40:01PM -0700, Michael Collette wrote:
> User Freebsd wrote:
> >On Sat, 1 Jul 2006, Francisco Reyes wrote:
> >
> >>John Hay writes:
> >>
> >>>I only started to see the lockd problems when upgrading the server side
> >>>to FreeBSD 6.x and later. I had various FreeBSD clients, between 4.x
> >>>and 7-current and the lockd problem only showed up when upgrading the
> >>>server from 5.x to 6.x.
> >>
> >>It confirms the same we are experiencing.. constant freezing/locking 
> >>issues.
> >>I guess no more 6.X for us.. for the foreseable future..
> >
> >Since there are several of us experiencing what looks to be the same 
> >sort of deadlock issue, I beseech you not to give up
> 
> Honestly trying not to.  To tell ya the truth, I've been giving a real 
> hard look at Ubuntu for my serving needs.  This NFS thing has got me 
> seriously questioning FreeBSD right at the moment.
> 
> >... right now, all 
> >we've been able to get to the developers is virtually useless 
> >information (vmstat and such shows the problem, but it doesn't allow 
> >developers to identify the problem) ...
> >
> >Is this a problem that you can easily recreate, even on a non-production 
> >machine?
> 
> Oh yeah.  I've got a couple of ways I'm able to get this to fail.
> 
> Method #1:
> ---------------------------------------------------------------------
> Let's start with the simplest.  The scenario here involves 2 machines, 
> mach01 and mach02.  Both are running 6-STABLE, and both are running 
> rpcbind, rpc.statd, and rpc.lockd.  mach01 has exported /documents and 
> mach02 is mounting that export under /mnt.  Simple enough?
> 
> The /documents directory has multiple subdirectories and files of 
> various sizes.  The actual amount of data doesn't really matter to 
> produce a failure.  All you need to do at this point is to try to copy 
> files from that mount point to somewhere else on the hard drive.
> 
> cp -Rp /mnt/* /tmp/documents/
> 
> You may, or not, see that a couple of subdirectories were created, but 
> no files actually moved over.  The cp command is now locked up, and no 
> traffic moves.  This usually takes a second or two to show up as a 
> problem.  I can repeat this with multiple 6-STABLE boxes.
> 
> Turn off rpc.lockd on either the server or client before the cp command, 
> and things work.
Either way you specified is too vague to reproduce the problem.
As was said, you shall supply tcpdump of the failed nfs session.

Personally, I tried to do what you described as method 1, and got no
hangs, everything copied as it should be. I did it between
amd64 6.1-STABLE as of yesterday (client) and same STABLE i386 as
server. Monitoring lockd interaction by ethereal also did not reveal anything.

So, what you need to provide to help debug the issue:
1. as detailed information on problem machines configuration as
possible
2. exact version of the software you using
3. tcpdump of nfs sessions (for me, it is preferable to get
raw tcpdump that could be load into ethereal)
4. log of rpc.lockd both on client and server (see the -d option in man
page).

Issue seems to be highly specific for some configuration details.
And, for instance, me is unable to reproduce it on debug testbench.
Without help of the user experiencing trouble, it could take forever
to kill that bug.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20060704/fefea603/attachment.pgp