rpc.lockd spinning; much breakage
Robert Watson
rwatson at FreeBSD.org
Tue May 13 10:45:45 PDT 2003
On Tue, 13 May 2003, Andrew P. Lentvorski, Jr. wrote:
> On Mon, 12 May 2003, Robert Watson wrote:
>
> > (3) Sometimes rpc.lockd on 5.x acting as a server gets really confused
> > when you mix local and remote locks. I haven't quite figured out the
> > circumstances, but occasionally I run into a situation where a client
> > contends against an existing lock on the server, and the client never
> > receives a notification from the server that the lock has been
> > released. It looks like the server stores state that the lock is
> > contended, but perhaps never properly re-polls the kernel to see if
> > the lock has been locally re-released:
>
> I just looked at the code again. rpc.lockd does not spawn off extra
> processes to continuously poll the kernel. It assumes that it has control
> of the underlying file and only rechecks the blockedlocklist when it
> receives and grants an NFS file unlock.
>
> Consequently, contention on the hardware needs to actually cause a *fail*
> and not queue up a lock for later. Currently, it returns a fail but
> still executes add_blockingfilelock. The offending code in lockd_lock.c
> is:
<...>
> A possible fix should be:
<...>
> This should cause the server to return nlm4_denied and the client should
> eventually retry the lock rather than waiting on the server.
>
> CAUTION! I haven't checked or compiled this code. If folks need me to,
> I can, but it will be a couple of days as I don't have two machines
> handy that I can install -CURRENT on and set up NFS.
The code actually compiles fine, and even runs :-). I now reliably get
EACCES for blocking and non-blocking lock requests on the client when
contending against a server lock. Here are the cases:
(1) Client attempts blocking and non-blocking O_EXLOCK on open,
uncontended:
crash1:/tmp> ./locktest nocreate openlock block noflock test 1
sleep 1
crash1:/tmp> ./locktest nocreate openlock nonblock noflock test 1
sleep 1
Log entries on client:
May 13 13:38:24 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1
May 13 13:38:25 crash1 rpc.lockd: nlm_unlock_res from 192.168.50.1
May 13 13:38:25 crash1 rpc.lockd: process 596: No such process
May 13 13:38:38 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1
May 13 13:38:39 crash1 rpc.lockd: nlm_unlock_res from 192.168.50.1
May 13 13:38:39 crash1 rpc.lockd: process 597: No such process
Note odd ESRCH at the end, although things appear to operate fine in
the test program.
(2) Client attempts blocking and non-blocking O_EXLOCK on open, contended
against a server exclusive lock:
crash1:/tmp> ./locktest nocreate openlock block noflock test 1
open: Permission denied
crash1:/tmp> ./locktest nocreate openlock nonblock noflock test 1
open: Permission denied
May 13 13:40:53 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1
May 13 13:40:57 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1
So the client isn't retrying, or mapping errors right after this patch,
but the failure modes are more consistent and I seem not to be getting any
interminable hangs anymore on the client.
Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
robert at fledge.watson.org Network Associates Laboratories
More information about the freebsd-current
mailing list