rpc.lockd kernel RPC over UDP patch for testing/review

Thu Aug 29 00:03:00 UTC 2013

Hi,

Doug White posted this to me via email some time ago (I hope he doesn't
mind me reposting it here):
> First, we have a installed client system doing heavy NFS lock traffic that occasionally
> experiences lockd lockups that require a system reboot to clear. Diagnosis of 
> the most recent hang identified corruption of one of the tracking variables
> (cu->cu_send specifically) in the congestion control in clnt_dg_call() as the culprit. 
> Since lockd only uses one thread, no congestion control is really necessary. We are
> going to make a local patch to avoid the if() that leads to the msleep() if 
> cu->threads = 1 so we don't run into that again, though the corruption of
> cu_send is still a bit troubling. The corruption might stem from repeated retries allowing 
> cu_send to grow without bound, or some other bizarre code path that causes underflow.

After inspecting the code, I found two places where cu_sent (Doug called it cu_send just to
try and confuse me. It worked for a while;-) wasn't incremented when a request was re-inserted
in the send queue. Since it is always decremented when a request is dequeued, I think this
could have resulted in a bogus cu_sent value.

The simple patch at:
 http://people.freebsd.org/~rmacklem/rpcudp.patch
adds increments for cu_sent for these two places.

If anyone is using rpc.lockd and can test/review this patch, it would be appreciated.

Thanks, rick