kern/107555: [rpc] 30 second delay in NFS lock of file after waiting for lock

Doug Rudoff joseph.blough at yahoo.com
Thu Feb 8 01:20:25 UTC 2007


The following reply was made to PR kern/107555; it has been noted by GNATS.

From: Doug Rudoff <joseph.blough at yahoo.com>
To: bug-followup at FreeBSD.org
Cc:  
Subject: Re: kern/107555: [rpc] 30 second delay in NFS lock of file after waiting for lock
Date: Wed, 7 Feb 2007 16:48:01 -0800 (PST)

 I've discovered what's happening.
 
 On this particular Linux client, a "rpcinfo -p" showed
 no registered nfs rpc services, including the
 important "nlockmgr". This was despite nfs and lockd
 running on the Linux client.
 
 On the FreeBSD side, when the original lock is
 released by the first client app, lockd then attempts
 to send a NLM_GRANTED to the waiting second client
 app. But with nlockmgr not a registered rpc service,
 lockd is not able to create an rpc client handle and
 thus is unable to send the message. However, lockd
 does no error checking after attempting to send the
 granted message and assumes the message was sent
 successfully. At this point lockd has the file locked
 by a client that is unaware it has a lock
 
 The waiting Linux client app gives up waiting for the
 NLM_GRANTED from FreeBSD's lockd after a set period
 and sends a new lock request. Since lockd is already
 holding the lock on the file for the client the lock
 is granted.
 
 When I restarted nfs on the Linux client, nlockmgr was
 listed as an rpc service, and the 30 second delay in
 getting a lock did not occur.
 
 You may wonder how any other messages are returned to
 the client if the client's rpc services aren't
 registered. Because when lockd receives a message, it
 knows the client handle that sent the message and can
 immediately reply to the same handle. But for the
 NLM_GRANTED message, the client handle isn't stored
 with the list of and it has to ask the client host for
 the handle through the rpc services that are
 registered.
 
 To sum things up:
 1) The problem was due to the missing nlockmgr rpc
 service on the Linux client.
 2) FreeBSD's lockd assumes it sent an NLM_GRANTED to a
 client waiting for a lock, even if it's unable to send
 the message.
 3) Since lockd assumes it sent the message, lockd
 holds the lock for the client, with the client being
 unaware it has the lock.
 4) Since the client never received the NLM_GRANTED
 while it was waiting for a lock, after 30 seconds it
 asks for the lock again, and is receives it because
 lockd is already holding the lock for the client.
 
 In send_granted(), if the client handle kept be
 obtained, there's this comment:
 
 "We fail to notify remote that the lock has been
 granted. The client will timeout and retry, the lock
 will be granted at this time."
 
 So, it was clearly intentional to not care if the
 client received the NLM_GRANTED message. This is
 further shown to be the case by the fact the lockd
 does not look for the reply from the client that it
 has accepted the granted lock.
 
 I'm going to suggest that if it is absolutely known
 that the client didn't receive the granted message,
 then the lock should not be granted.
 
 Now this won't affect the problematic behavior. It
 will still take 30 seconds for the client to timeout
 and request the lock again. But during those 30
 seconds another client could succesfully grab a lock.
 Otherwise, if the waiting client dies, lockd will
 still be holding the lock unaware that the client is
 gone and no other client will be able to get the lock.
 
 My suggestion on how to fix this:
 In lockd_lock.c, send_granted() is defined with a void
 return. Change it to an int return, with a -1 returned
 if the client handle was not obtained, and 0 if the
 message was sent.
 
 In retry_blockingfilelocklist(), if send_granted()
 returns a -1, then the initial lock request is denied
 and the client will have to ask for the lock again.
 
 I was thinking an alternative fix would be to add the
 client handle to struct file_lock. But reading the
 comments before get_client() in lock_proc.c gives good
 reasons why you don't want to do that (e.g. the client
 host reboots and the client handle is no longer
 valid).
 
 I have create a patch, but until I can get a Linux
 client into the state where nfs and lockd are running
 on it but not listed in the rpc registry I won't be
 able to test it exactly (although I could do a test by
 altering the code so that send_granted() always
 failed).
 
 
 
  
 ____________________________________________________________________________________
 Don't get soaked.  Take a quick peak at the forecast
 with the Yahoo! Search weather shortcut.
 http://tools.search.yahoo.com/shortcuts/#loc_weather


More information about the freebsd-bugs mailing list