Network Stack Locking

Matthew Dillon dillon at apollo.backplane.com
Mon May 24 20:39:41 PDT 2004


:On Mon, 24 May 2004, Eivind Eklund wrote:
:
:> On Fri, May 21, 2004 at 01:23:51PM -0400, Robert Watson wrote:
:> > The other concern I have is whether the message queues get deep or not: 
:> > many of the benefits of message queues come when the queues allow
:> > coallescing of context switches to process multiple packets.  If you're
:> > paying a context switch per packet passing through the stack each time you
:> > cross a boundary, there's a non-trivial operational cost to that.
:> 
:> I don't know what Matt has done here, but at least with the design we
:> used for G2 (a private DFly-like project that John Dyson, I, and a few
:> other people I don't know if want to be anonymous or not ran), this
:> should not an issue.  We used thread context passing with an API that
:> contained putmsg_and_terminate() and message ports that automatically
:> could spawn new handler threads.  Effectively, a message-related context
:> switch turned into "assemble everything I care about in a small package,
:> reset the stack pointer, and go".  The expectation was that this should
:> end up with less overhead than function calls, as we could drop the call
:> frames for "higher levels in the chain".  We never got to the point
:> where we could measure if it worked out that way in practice, though. 
:
:Sounds a lot like a lot of the Mach IPC optimizations, including their use
:of continuations during IPC to avoid a full context switch.
:
:Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
:robert at fledge.watson.org      Senior Research Scientist, McAfee Research

    Well, I like the performance aspects of a continuation mechanism, but
    I really dislike the memory overhead.  Even a minimal stack is
    expensive when you multiply it by potentially hundreds of thousands
    of 'blocking' entities such as PCBs.. say, a TCP output stream.  
    Because of this the overhead and cache pollution generated by the
    continuation mechanism increases as system load increases rather
    then decreases.

    Deep message queues aren't necessarily a problem and, in fact, having
    one or two dozen messages backed up in a protocol thread's message
    port is actually good because the thread can then process all the
    messages in a tight loop (cpu and cache locality of reference).  If
    designed properly, this directly mitigates the cost of a thread switch
    as system load increases.  So message queueing has the opposite effect...
    per-unit handling overhead *decreases* as system load increases.
    (Also, DragonFly's thread scheduler is a much lighter weight mechanism
    then what you have in FBsd-4 or FBsd-5).

    e.g.:  lets say you have a context switch overhead of 1uS and a message
    processing overhead of 100ns.
	
	light load:	100 messages/sec:	1.1uS/message

	medium load:	1000 messages/sec, average 10 messages in queue at
			context switch:		10*100ns+1uS = 2uS/10 =
						200ns/msg

	heavy load:	10000 msgs/sec, average 100 msgs in queue:
						100*100ns+1uS = 11uS/100=
						110ns/msg

    The reason a deep message queue is not a problem vs other mechanisms
    is simple... a message represents a unit of work.  The work must be
    done regardless, and on the cpu it was told to be done on, no matter
    whether you use a message or a continuation or some other mechanism.
    In otherwords, a deep message queue is actually an effect of the
    problem, not a cause of that problem.  Solving the problem (if it
    actually is a problem) does not involve dealing with the deep message
    queue, it involves dealing with the set of circumstances that are
    causing that deep message queue to occur.

    Now, certainly end-to-end latency is an issue.  But when one is talking
    about context switching one is talking about nanoseconds and microseconds.
    Turn-around latency just isn't an issue most of the time, and in those
    extremely rare cases where it might be one does the turn-around in the 
    driver interrupt anyway.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>


More information about the freebsd-arch mailing list