HEADS UP: inpcb/inpcbinfo rwlocking: coming to a 7-STABLE branch
rwatson at FreeBSD.org
Sun Aug 3 10:54:43 UTC 2008
This is an advance warning that, late next week, I will be merging a fairly
large set of changes to the IPv4 and IPv6 protocols layered over the
inpcb/inpcbinfo kernel infrastructure. To be specific, this affects TCP, UDP,
and raw sockets on both IPv4 and IPv6. I will post a further e-mail
announcement along with patch set and schedule in a day or two once it's
The thrust of this change is to replace the mutexes protecting the inpcb and
inpcbinfo data structures with read-write locks (rwlocks). These structures
represent, respectively, particular sockets and the global socket lists for
all socket types in IPv4 and IPv6 except for SCTP. When you run netstat,
inpcbinfo is the data structure referencing all connections, and each line in
the nestat output reflects the contents of a specific inpcb.
In the current stage of this work, the intent is to improve performance for
datagram-related protocols on SMP systems by allowing concurrent acquisition
of both global and connection locks during receive and transmit. This is
possible because, in the common case, no connection or global state is
modified during UDP/raw receive and transmit at the IP layer, so a read lock
is sufficient to prevent data in those structures from unexpectedly changing.
For receive, socket layer state is modified, but this is separately protected
by socket layer locks. On transmit, no state is modified at any layer, so in
principle we will allow fully parallel transmit from multiple threads down to
about the routing and network interface layers, whereas previously they would
bottleneck in UDP.
The applications targeted by this change are threaded UDP server applications,
such as BIND9, nsd, and UDP-based memcached. Kris Kennaway and Paul Saab have
done fairly extensive testing with the changes and demonstrated significant
performance improvements due to reduced contention and overhead. Perhaps they
can mention some of those numbers in a follow-up to this post.
The reason for the heads up is that, while carefully-tested, changes of this
sort do come with risks. We've carefully structured them so as to avoid
breaking the ABIs for netstat, etc, but it's not impossible that some problems
will arise as the changes settle. The goal, however, is to see these
performance improvements in 7.1, and since they've had a bit to shake out in
8.x and seen some heavy use, I think now is the right time to merge them.
In any case, I will send out e-mail in a couple of days with a proposed merge
patch and schedule for merging, and perhaps if you are in a positition where
you might benefit from these improvements, or have interesting UDP or
raw-socket based applications running on 7.x, you could test the candidate
patch before it's merged, reporting any problems. Unless I receive negative
feedback, I will plan on merging the changes late in the week, and keep a
close eye on stable@ for any reports of problems.
Robert N M Watson
University of Cambridge
More information about the freebsd-stable