kern/107555: [rpc] 30 second delay in NFS lock of file after
waiting for lock
Doug Rudoff
joseph.blough at yahoo.com
Thu Feb 8 01:20:25 UTC 2007
The following reply was made to PR kern/107555; it has been noted by GNATS.
From: Doug Rudoff <joseph.blough at yahoo.com>
To: bug-followup at FreeBSD.org
Cc:
Subject: Re: kern/107555: [rpc] 30 second delay in NFS lock of file after waiting for lock
Date: Wed, 7 Feb 2007 16:48:01 -0800 (PST)
I've discovered what's happening.
On this particular Linux client, a "rpcinfo -p" showed
no registered nfs rpc services, including the
important "nlockmgr". This was despite nfs and lockd
running on the Linux client.
On the FreeBSD side, when the original lock is
released by the first client app, lockd then attempts
to send a NLM_GRANTED to the waiting second client
app. But with nlockmgr not a registered rpc service,
lockd is not able to create an rpc client handle and
thus is unable to send the message. However, lockd
does no error checking after attempting to send the
granted message and assumes the message was sent
successfully. At this point lockd has the file locked
by a client that is unaware it has a lock
The waiting Linux client app gives up waiting for the
NLM_GRANTED from FreeBSD's lockd after a set period
and sends a new lock request. Since lockd is already
holding the lock on the file for the client the lock
is granted.
When I restarted nfs on the Linux client, nlockmgr was
listed as an rpc service, and the 30 second delay in
getting a lock did not occur.
You may wonder how any other messages are returned to
the client if the client's rpc services aren't
registered. Because when lockd receives a message, it
knows the client handle that sent the message and can
immediately reply to the same handle. But for the
NLM_GRANTED message, the client handle isn't stored
with the list of and it has to ask the client host for
the handle through the rpc services that are
registered.
To sum things up:
1) The problem was due to the missing nlockmgr rpc
service on the Linux client.
2) FreeBSD's lockd assumes it sent an NLM_GRANTED to a
client waiting for a lock, even if it's unable to send
the message.
3) Since lockd assumes it sent the message, lockd
holds the lock for the client, with the client being
unaware it has the lock.
4) Since the client never received the NLM_GRANTED
while it was waiting for a lock, after 30 seconds it
asks for the lock again, and is receives it because
lockd is already holding the lock for the client.
In send_granted(), if the client handle kept be
obtained, there's this comment:
"We fail to notify remote that the lock has been
granted. The client will timeout and retry, the lock
will be granted at this time."
So, it was clearly intentional to not care if the
client received the NLM_GRANTED message. This is
further shown to be the case by the fact the lockd
does not look for the reply from the client that it
has accepted the granted lock.
I'm going to suggest that if it is absolutely known
that the client didn't receive the granted message,
then the lock should not be granted.
Now this won't affect the problematic behavior. It
will still take 30 seconds for the client to timeout
and request the lock again. But during those 30
seconds another client could succesfully grab a lock.
Otherwise, if the waiting client dies, lockd will
still be holding the lock unaware that the client is
gone and no other client will be able to get the lock.
My suggestion on how to fix this:
In lockd_lock.c, send_granted() is defined with a void
return. Change it to an int return, with a -1 returned
if the client handle was not obtained, and 0 if the
message was sent.
In retry_blockingfilelocklist(), if send_granted()
returns a -1, then the initial lock request is denied
and the client will have to ask for the lock again.
I was thinking an alternative fix would be to add the
client handle to struct file_lock. But reading the
comments before get_client() in lock_proc.c gives good
reasons why you don't want to do that (e.g. the client
host reboots and the client handle is no longer
valid).
I have create a patch, but until I can get a Linux
client into the state where nfs and lockd are running
on it but not listed in the rpc registry I won't be
able to test it exactly (although I could do a test by
altering the code so that send_granted() always
failed).
____________________________________________________________________________________
Don't get soaked. Take a quick peak at the forecast
with the Yahoo! Search weather shortcut.
http://tools.search.yahoo.com/shortcuts/#loc_weather
More information about the freebsd-bugs
mailing list