HEADS UP: inpcb/inpcbinfo rwlocking: coming to a 7-STABLE branch near you

Robert Watson rwatson at FreeBSD.org
Sun Aug 3 10:54:43 UTC 2008


Dear all:

This is an advance warning that, late next week, I will be merging a fairly 
large set of changes to the IPv4 and IPv6 protocols layered over the 
inpcb/inpcbinfo kernel infrastructure.  To be specific, this affects TCP, UDP, 
and raw sockets on both IPv4 and IPv6.  I will post a further e-mail 
announcement along with patch set and schedule in a day or two once it's 
prepared.

The thrust of this change is to replace the mutexes protecting the inpcb and 
inpcbinfo data structures with read-write locks (rwlocks).  These structures 
represent, respectively, particular sockets and the global socket lists for 
all socket types in IPv4 and IPv6 except for SCTP.  When you run netstat, 
inpcbinfo is the data structure referencing all connections, and each line in 
the nestat output reflects the contents of a specific inpcb.

In the current stage of this work, the intent is to improve performance for 
datagram-related protocols on SMP systems by allowing concurrent acquisition 
of both global and connection locks during receive and transmit.  This is 
possible because, in the common case, no connection or global state is 
modified during UDP/raw receive and transmit at the IP layer, so a read lock 
is sufficient to prevent data in those structures from unexpectedly changing. 
For receive, socket layer state is modified, but this is separately protected 
by socket layer locks.  On transmit, no state is modified at any layer, so in 
principle we will allow fully parallel transmit from multiple threads down to 
about the routing and network interface layers, whereas previously they would 
bottleneck in UDP.

The applications targeted by this change are threaded UDP server applications, 
such as BIND9, nsd, and UDP-based memcached.  Kris Kennaway and Paul Saab have 
done fairly extensive testing with the changes and demonstrated significant 
performance improvements due to reduced contention and overhead.  Perhaps they 
can mention some of those numbers in a follow-up to this post.

The reason for the heads up is that, while carefully-tested, changes of this 
sort do come with risks.  We've carefully structured them so as to avoid 
breaking the ABIs for netstat, etc, but it's not impossible that some problems 
will arise as the changes settle.  The goal, however, is to see these 
performance improvements in 7.1, and since they've had a bit to shake out in 
8.x and seen some heavy use, I think now is the right time to merge them.

In any case, I will send out e-mail in a couple of days with a proposed merge 
patch and schedule for merging, and perhaps if you are in a positition where 
you might benefit from these improvements, or have interesting UDP or 
raw-socket based applications running on 7.x, you could test the candidate 
patch before it's merged, reporting any problems.  Unless I receive negative 
feedback, I will plan on merging the changes late in the week, and keep a 
close eye on stable@ for any reports of problems.

Thanks,

Robert N M Watson
Computer Laboratory
University of Cambridge


More information about the freebsd-stable mailing list