TCP stack lock contention with short-lived connections

Thu Nov 7 14:10:51 UTC 2013

Hi list,

On Mon, 04 Nov 2013 22:21:04 +0100, Julien Charbon <jcharbon at verisign.com>  
wrote:
>   just a follow-up of vBSDCon discussions about FreeBSD TCP performances  
> with short-lived connections.  In summary: <snip>
>
> I have put technical and how-to-repeat details in below PR:
>
> kern/183659: TCP stack lock contention with short-lived connections
> http://www.freebsd.org/cgi/query-pr.cgi?pr=183659
>
>   We are currently working on this performance improvement effort;  it  
> will impact only the TCP locking strategy not the TCP stack logic  
> itself.  We will share on freebsd-net the patches we made for reviewing  
> and improvement propositions;  anyway this change might also require  
> enough eyeballs to avoid tricky race conditions introduction in TCP  
> stack.

  Just a follow-up:  We are currently removing TCP INP_INFO lock from  
places it is actually not required in order to mitigate the lock  
contention.  It seems to be a good first step in this effort:  Small  
changes, easy to review, low risk (and small gain... right).

  Below a first patch that removes INP_INFO lock from tcp_usr_accept():   
This changes simply follows the advice made in corresponding code  
comment:  "A better fix would prevent the socket from being placed in the  
listen queue until all fields are fully initialized."  For more technical  
details, check the comment in related change below:

http://svnweb.freebsd.org/base?view=revision&revision=175612

  With this patch applied we see no regressions and a performance  
improvement of ~5% i.e with 9.2 vanilla kernel:  52k TCP Queries Per  
Second, with 9.2 + joined patch:  55k TCP QPS.  Not huge indeed but still  
an improvement.

  P.S.:  Funny enough it seems that the same change has already been  
proposed in the past:
http://lists.freebsd.org/pipermail/freebsd-net/2013-January/034261.html

--
Julien

From: Julien Charbon <jcharbon at verisign.com>
Subject: [PATCH] Add new socket in listen queue only when fully initialized

---
  sys/netinet/tcp_syncache.c | 4 +++-
  sys/netinet/tcp_usrreq.c   | 9 ---------
  2 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/sys/netinet/tcp_syncache.c b/sys/netinet/tcp_syncache.c
index af1651a..eb73356 100644
--- a/sys/netinet/tcp_syncache.c
+++ b/sys/netinet/tcp_syncache.c
@@ -660,7 +660,7 @@ syncache_socket(struct syncache *sc, struct socket  
*lso, struct mbuf *m)
  	 * connection when the SYN arrived.  If we can't create
  	 * the connection, abort it.
  	 */
-	so = sonewconn(lso, SS_ISCONNECTED);
+	so = sonewconn(lso, 0);
  	if (so == NULL) {
  		/*
  		 * Drop the connection; we will either send a RST or
@@ -890,6 +890,8 @@ syncache_socket(struct syncache *sc, struct socket  
*lso, struct mbuf *m)

  	INP_WUNLOCK(inp);

+	soisconnected(so);
+
  	TCPSTAT_INC(tcps_accepts);
  	return (so);

diff --git a/sys/netinet/tcp_usrreq.c b/sys/netinet/tcp_usrreq.c
index b83f34a..566cc34 100644
--- a/sys/netinet/tcp_usrreq.c
+++ b/sys/netinet/tcp_usrreq.c
@@ -609,13 +609,6 @@ out:
  /*
   * Accept a connection.  Essentially all the work is done at higher  
levels;
   * just return the address of the peer, storing through addr.
- *
- * The rationale for acquiring the tcbinfo lock here is somewhat  
complicated,
- * and is described in detail in the commit log entry for r175612.   
Acquiring
- * it delays an accept(2) racing with sonewconn(), which inserts the  
socket
- * before the inpcb address/port fields are initialized.  A better fix  
would
- * prevent the socket from being placed in the listen queue until all  
fields
- * are fully initialized.
   */
  static int
  tcp_usr_accept(struct socket *so, struct sockaddr **nam)
@@ -632,7 +625,6 @@ tcp_usr_accept(struct socket *so, struct sockaddr  
**nam)

  	inp = sotoinpcb(so);
  	KASSERT(inp != NULL, ("tcp_usr_accept: inp == NULL"));
-	INP_INFO_RLOCK(&V_tcbinfo);
  	INP_WLOCK(inp);
  	if (inp->inp_flags & (INP_TIMEWAIT | INP_DROPPED)) {
  		error = ECONNABORTED;
@@ -652,7 +644,6 @@ tcp_usr_accept(struct socket *so, struct sockaddr  
**nam)
  out:
  	TCPDEBUG2(PRU_ACCEPT);
  	INP_WUNLOCK(inp);
-	INP_INFO_RUNLOCK(&V_tcbinfo);
  	if (error == 0)
  		*nam = in_sockaddr(port, &addr);
  	return error;