CFR: New NFS Lock Manager
dfr at rabson.org
Fri Mar 21 03:34:37 PDT 2008
As I mentioned previously, I have been working on a brand new NFS Lock
Manager which runs in kernel mode and uses the normal local locking
infrastructure for its state. I'm currently trying to tie up the last
few loose ends before committing this work to current. You can find a
snapshot of this code at http://people.freebsd.org/~dfr/lockd-RC1-20032008.diff
To try it out, take a recent current (I last merged with current on
20th March) and apply the patch. Build a kernel with the NFSLOCKD
option and add '-k' to 'rpc_lockd_flags' in rc.conf. You will need to
build and install at least a new libc and rpc.lockd.
At this point, it would be useful to get some extra eyes to look over
my changes. In particular the following:
1. Choice of syscall number - I found one spare next to the NFS
syscall and took that. The new syscall is listed in the FBSD_1.1
namespace, possibly it should be somewhere else.
2. ABI compatibility - I extended the flock structure by one member
(adding l_sysid). I have added new operations to fcntl to support the
new extended structure, leaving the old operations in place to work on
the old structure. The kernel translates old to new and vice versa. No
attempt is made to allow a new userland to work with an old kernel.
3. The local lock manager has had a complete rewrite to support
required features. The new local lock manager supports a more flexible
model of lock ownership (which can support remote lock owners). I have
replaced the inadequate deadlock detection code with a new (and fast)
graph based system. Using the deadlock graph, I was able to avoid the
'thundering herd' issues the old lock code had when many processes
were contending for the same locked region. Given the extent of the
changes, wider testing and review would be extremely welcome.
4. The NFS lock manager itself is brand new code and as such ought to
be reviewed. I have also ported the userland sunrpc code to run in the
kernel environment which may prove useful in future.
* Thread-safe kernel RPC client - many threads can use the same RPC
client handle safely with replies being de-multiplexed at the socket
upcall (typically driven directly by the NIC interrupt) and handed off
to whichever thread matches the reply. For UDP sockets, many RPC
clients can share the same socket. This allows the use of a single
privileged UDP port number to talk to an arbitrary number of remote
* Single-threaded kernel RPC server. Adding support for multi-threaded
server would be relatively straightforward and would follow
approximately the Solaris KPI. A single thread should be sufficient
for the NLM since it should rarely block in normal operation.
* Kernel mode NLM server supporting cancel requests and granted
callbacks. I've tested the NLM server reasonably extensively - it
passes both my own tests and the NFS Connectathon locking tests
running on Solaris, Mac OS X and Ubuntu Linux.
* Userland NLM client supported. While the NLM server doesn't have
support for the local NFS client's locking needs, it does have to
field async replies and granted callbacks from remote NLMs that the
local client has contacted. We relay these replies to the userland
rpc.lockd over a local domain RPC socket.
* IPv6 should be supported but has not been tested since I've been
unable to get IPv6 to work properly with the Parallels virtual
machines that I've been using for development.
* Robust deadlock detection for the local lock manager. In particular
it will detect deadlocks caused by a lock request that covers more
than one blocking request. As required by the NLM protocol, all
deadlock detection happens synchronously - a user is guaranteed that
if a lock request isn't rejected immediately, the lock will eventually
be granted. The old system allowed for a 'deferred deadlock' condition
where a blocked lock request could wake up and find that some other
deadlock-causing lock owner had beaten them to the lock.
* Since both local and remote locks are managed by the same kernel
locking code, local and remote processes can safely use file locks for
mutual exclusion. Local processes have no fairness advantage compared
to remote processes when contending to lock a region that has just
been unlocked - the local lock manager enforces a strict first-come
first-served model for both local and remote lockers.
More information about the freebsd-current