New optimized soreceive_stream() for TCP sockets, proof of concept

Fri Mar 2 13:32:31 UTC 2007

Currently we are using the generic soreceive_generic() function to pull and
copy data from the socket buffer to userland.  It is a huge function that
can deal with all eventualities and types of data that may happen on socket
buffers.  From a performance point of view most importantly it does a unlock-
lock cycle per mbuf data segment that is copied out.  This is neccessary to
avoid deadlocks.  On high speed TCP connections this leads to high locking
overhead and contention on the receive socket buffer lock as both the upper
and the lower half have to compete.  The lower half wants to add newly received
data while the upper half wants to move it to userland and the application.

This patch takes a different approach by adding a specific soreceive_stream()
function that is highly optimized for stream type sockets as TCP uses.  On the
send side we've done this differentiation in a different way a long time ago.

Instead of the unlock-lock dance soreceive_stream() pulls a properly sized
(relative to the receive system call buffer space) from the socket buffer drops
the lock and gives copyout as much time as it needs.  In the mean time the lower
half can happily add as many new packets as it wants without having to wait for
a lock.  It also allows the upper and lower halfs to run on different CPUs without
much interference.  There is a unsolved nasty race condition in the patch though.
When the socket closes and we still have data around or the copyout failed it tries
to put the data back into the socket buffer which is gone already by then leading
to a panic.  Work is underway to find a realiable fix for this.  I wanted to get
this out to the community nonetheless to give it some more exposure.

The patch is here:

  http://people.freebsd.org/~andre/soreceive_stream-20070302.diff

Any testing, especially on 10Gig cards, and feedback appreciated.

-- 
Andre