CFR: New NFS Lock Manager

Fri Mar 21 03:34:37 PDT 2008

As I mentioned previously, I have been working on a brand new NFS Lock  
Manager which runs in kernel mode and uses the normal local locking  
infrastructure for its state. I'm currently trying to tie up the last  
few loose ends before committing this work to current. You can find a  
snapshot of this code at http://people.freebsd.org/~dfr/lockd-RC1-20032008.diff 
.

To try it out, take a recent current (I last merged with current on  
20th March) and apply the patch. Build a kernel with the NFSLOCKD  
option and add '-k' to 'rpc_lockd_flags' in rc.conf. You will need to  
build and install at least a new libc and rpc.lockd.

At this point, it would be useful to get some extra eyes to look over  
my changes. In particular the following:

1. Choice of syscall number - I found one spare next to the NFS  
syscall and took that. The new syscall is listed in the FBSD_1.1  
namespace, possibly it should be somewhere else.

2. ABI compatibility - I extended the flock structure by one member  
(adding l_sysid). I have added new operations to fcntl to support the  
new extended structure, leaving the old operations in place to work on  
the old structure. The kernel translates old to new and vice versa. No  
attempt is made to allow a new userland to work with an old kernel.

3. The local lock manager has had a complete rewrite to support  
required features. The new local lock manager supports a more flexible  
model of lock ownership (which can support remote lock owners). I have  
replaced the inadequate deadlock detection code with a new (and fast)  
graph based system. Using the deadlock graph, I was able to avoid the  
'thundering herd' issues the old lock code had when many processes  
were contending for the same locked region. Given the extent of the  
changes, wider testing and review would be extremely welcome.

4. The NFS lock manager itself is brand new code and as such ought to  
be reviewed. I have also ported the userland sunrpc code to run in the  
kernel environment which may prove useful in future.

Highlights include:

* Thread-safe kernel RPC client - many threads can use the same RPC  
client handle safely with replies being de-multiplexed at the socket  
upcall (typically driven directly by the NIC interrupt) and handed off  
to whichever thread matches the reply. For UDP sockets, many RPC  
clients can share the same socket. This allows the use of a single  
privileged UDP port number to talk to an arbitrary number of remote  
hosts.

* Single-threaded kernel RPC server. Adding support for multi-threaded  
server would be relatively straightforward and would follow  
approximately the Solaris KPI. A single thread should be sufficient  
for the NLM since it should rarely block in normal operation.

* Kernel mode NLM server supporting cancel requests and granted  
callbacks. I've tested the NLM server reasonably extensively - it  
passes both my own tests and the NFS Connectathon locking tests  
running on Solaris, Mac OS X and Ubuntu Linux.

* Userland NLM client supported. While the NLM server doesn't have  
support for the local NFS client's locking needs, it does have to  
field async replies and granted callbacks from remote NLMs that the  
local client has contacted. We relay these replies to the userland  
rpc.lockd over a local domain RPC socket.

* IPv6 should be supported but has not been tested since I've been  
unable to get IPv6 to work properly with the Parallels virtual  
machines that I've been using for development.

* Robust deadlock detection for the local lock manager. In particular  
it will detect deadlocks caused by a lock request that covers more  
than one blocking request. As required by the NLM protocol, all  
deadlock detection happens synchronously - a user is guaranteed that  
if a lock request isn't rejected immediately, the lock will eventually  
be granted. The old system allowed for a 'deferred deadlock' condition  
where a blocked lock request could wake up and find that some other  
deadlock-causing lock owner had beaten them to the lock.

* Since both local and remote locks are managed by the same kernel  
locking code, local and remote processes can safely use file locks for  
mutual exclusion. Local processes have no fairness advantage compared  
to remote processes when contending to lock a region that has just  
been unlocked - the local lock manager enforces a strict first-come  
first-served model for both local and remote lockers.