Call for performance evaluation: net.isr.direct
rwatson at FreeBSD.org
Tue Oct 11 07:01:13 PDT 2005
On Wed, 5 Oct 2005, Robert Watson wrote:
> In 2003, Jonathan Lemon added initial support for direct dispatch of
> netisr handlers from the calling thread, as part of his DARPA/NAI Labs
> contract in the DARPA CHATS research program. Over the last two years
> since then, Sam Leffler and I have worked to refine this implementation,
> removing a number of ordering related issues, opportunities for
> excessive parallelism, recursion issues, and testing with a broad range
> of network components. There has also been a significant effort to
> complete MPSAFE locking work throughout the network stack. Combined
> with the earlier move to ithreads and a functional direct dispatch
> ("process to completion" implementation), there are a number of exciting
> possible benefits.
If I don't hear anything back in the near future, I will commit a change
to 7.x to make direct dispatch the default, in order to let a broader
community do the testing. :-) If you are setup to easily test stability
and performance relating to direct dispatch, I would appreciate any help.
As of 6.0-RC1 and recent 7.x, the name of the sysctl is "net.isr.direct";
previously it has been named "net.isr.enable", but its use is not
recommend in versions that do not use the new name.
Robert N M Watson
> - Possible parallelism by packet source -- ithreads can dispatch
> simultaenously into the higher level network stack layers. Since
> ithreads can execute in parallel on different CPU, so can code they
> invoke directly.
> - Elimination of context switches in the network receive path -- rather
> than context switching to the netisr thread from the ithread, we can now
> directly execute netisr code from the ithread.
> - A CPU-bound netisr thread on a multi-processor system will no longer
> rate limit traffic to the available resources on one CPU.
> - Eliminating the additional queueing in the handoff reduces the
> opportunity for queues to overfill as a result of scheduling delays.
> There are, however, some possible downsides and/or trade-offs:
> - Higher level network processing will now compete with the interrupt
> handler for CPU resources available to the ithread. This means less
> time for the interrupt code to execute in the thread if the thread is
> - Lower levels of parallelism between portions of the inbound packet
> processing path. Without direct dispatch, there is possible parallelism
> between receive network driver execution and higher level stack layers,
> whereas with direct dispatch they can no longer execute in parallel.
> - Re-queued packets from tunnel and encapsulation processing will now
> require a context switch to process, since they will be processed in the
> netisr proper rather than in the ithread, whereas before the netisr
> thread would pick them up immediately after completing the current
> processing without a context switch.
> - Code that previously ran in the SWI at a SWI priority now runs in the
> ithread at an ithread priority, elevating the general priority at which
> network processing takes place.
> And there are a few mixed things, that can offer good and bad elements:
> - Less queueing takes place in the network stack in in-bound processing:
> packets are taken directly from the driver and processed to completion
> one by one, rather than queued for batch processing. Packets will be
> dropped before the link layer, rather than on the boundary between the
> link and protocol layers. This is good in that we invest less work in
> packets we were going to drop anyway, but bad in that less queueing
> means less room for scheduling delays.
> In previous FreeBSD releases, such as several 5.x series releases,
> net.isr.enable could not be turned on by default because there was
> insufficient synchronization in the network stack. As of 5.5 and 6.0, I
> believe there is sufficient synchronization, especially given that we force
> non-MPSAFE protocol handlers to run in the netisr without direct dispatch.
> As such, there has been a gradual conversation going on about making direct
> dispatch the default behavior in the 7.x development series, and more
> publically documenting and supporting the use of direct dispatch in the 6.x
> release engineering series.
> Obviously, this is about two things: performance, and stability. Many of us
> have been running with direct dispatch on by default for quite some time, so
> it passes some of the basic "does it run" tests. However, since it
> significantly increases the opportunity for parallelism in the receive path
> of the network stack, it likely will trigger otherwise latent or infrequent
> races and bugs to occur more frequently. The second aspect is performance:
> many results suggest that direct dispatch has a significant performance
> benefit. However, evaluating the impact on a broad range of results is
> required in order for us to go ahead with what is effectively a significant
> architectural change in how we perform network stack processing.
> To give you a sense of some of the performance effect I've measured recently,
> using the netperf measurement tool (with -DHISTOGRAM removed from the FreeBSD
> port build), here are some results. In each case, I've put parenthesis
> around host or router to indicate which is the host where the configuration
> change is being tested. These tests were performed using dual Xeon systems,
> and using back-to-back gigabit ethernet cards and the if_em driver:
> TCP round trip benchmark (TCP_RR), host-(host):
> 7.x UP: 0.9% performance improvement
> 7.x SMP: 0.7% performance improvement
> TCP round trip benchmark (TCP_RR), host-(router)-host:
> 7.x UP: 2.4% performance improvement
> 7.x SMP: 2.9% performance improvement
> UDP round trip benchmark (UDP_RR), host-(host):
> 7.x UP: 0.7% performance improvement
> 7.x SMP: 0.6% performance improvement
> UDP round trip benchmark (UDP_RR), host-(router)-host:
> 7.x UP: 2.2% performance improvement
> 7.x SMP: 3.0% performance improvement
> TCP stream banchmark (TCP_STREAM), host-(host):
> 7.x UP: 0.8% performance improvement
> 7.x SMP: 1.8% performance improvement
> TCP stream benchmark (TCP_STREAM), host-(router)-host:
> 7.x UP: 13.6% performance improvement
> 7.x SMP: 15.7% performance improvement
> UDP stream benchmark (UDP_STREAM), host-(host):
> 7.x UP: none
> 7.x SMP: none
> UDP stream benchmark (UDP_STREAM), host-(router)-host:
> 7.x UP: none
> 7.x SMP: none
> TCP connect benchmark (src/tools/tools/netrate/tcpconnect)
> 7.x UP: 7.90383% +/- 0.553773%
> 7.x SMP: 12.2391% +/- 0.500561%
> So in some cases, the impact is negligible -- in other places, it is quite
> significant. So far, I've not measured a case where performance has gotten
> worse, but that's probably because I've only been measuring a limited number
> of cases, and with a fairly limited scope of configurations, especially given
> that the hardware I have is pushing the limits of what the wire supports, so
> minor changes in latency are possible, but not large changes in throughput.
> So other than a summary of the status quo, this is also a call to action. I
> would like to get more widespread benchmarking of the impact of direct
> dispatch on network-related workloads. This means a variety of things:
> (1) Performance of low level network services, such as routing, bridging,
> and filtering.
> (2) Performance of high level application servces, such as web and
> (3) Performance of integrated kernel network services, such as the NFS
> client and server.
> (4) Performance of user space distributed file systems, such as Samba and
> All you need to do to switch to direct dispatch mode is set the sysctl or
> tunable "net.isr.dispatch" to 1. To disable it again, remove the setting, or
> set it to 0. It can be modified at run-time, although during the transition
> from one mode to the other, there may be a small quantity of packet
> misordering, so benchmarking over the transition is discouraged.
> FYI: as of 6.0-RC1 and recent 7.0, net.isr.dispatch is the name of the
> variable. In earlier releases, the name of this variable was net.isr.enable.
> Some important details:
> - Only non-local protocol traffic is affected: loopback traffic still goes
> via the netisr to avoid issues of recursion and lock order.
> - In the general case, only in-bound traffic is directly affected by this
> change. As such, send-only benchmarks may reveal little change. They
> are still interesting, however.
> - However, the send path is indirectly affected due to changes in
> scheduling, workload, interrupt handling, and so on.
> - Because network benchmarks, especially micro-benchmarks, are especially
> sensitive to minor perturbations, I highly recommend running in a
> minimal multi-user or ideally single-user environment, and suggest
> isolating undesired sources of network traffic from segments where
> testing is occuring. For macro-benchmarks this can be less important,
> but should be paid attention to.
> - Please make sure debugging features are turned off when running tests --
> especially WITNESS, INVARIANTS, INVARIANT_SUPPORT, and user space malloc
> debugging. These can have a significant impact on performance, both
> potentially overshadowing changes, and in some cases, actually reversing
> results (due to higher overhead under locks, for example).
> - Do not use net.isr.enable in the 5.x line unless you know what you are
> doing. While it is reasonably safe with 5.4 forwards, it is not a
> supported configuration, and may cause stability issues with specific
> - What we're particularly interested in is a statistically meaningful
> comparison of the "before" and "after" case. When doing measurements, I
> like to run 10-12 samples, and usually discard the first one or two,
> depending on the details of the benchmark. I'll then use
> src/tools/tools/ministat to compare the data sets. Running a number of
> samples is quite important, because the variance in many tests can be
> significant, and if the two sample sets overlap, you can quite easily
> draw the entirely wrong conclusion about the results from a small number
> of measurements in a sample.
> Assuming you have a fixed width font, typicaly output from ministat looks
> something like the following and may be human readable:
> x 7SMP/tcpconnect_queue
> + 7SMP/tcpconnect_direct
> |x xx + +|
> |xxxxx xx ++ +++++ +|
> ||__A__| |___A__| |
> N Min Max Median Avg Stddev
> x 10 5425 5503 5460 5456.3 26.284977
> + 10 6074 6169 6126 6124.1 31.606785
> Difference at 95.0% confidence
> 667.8 +/- 27.3121
> 12.2391% +/- 0.500561%
> (Student's t, pooled s = 29.0679)
> Of particular interest is if changing to direct dispatch hurts performance in
> your environment, and understanding why that is.
> Robert N M Watson
> freebsd-performance at freebsd.org mailing list
> To unsubscribe, send any mail to
> "freebsd-performance-unsubscribe at freebsd.org"
More information about the freebsd-net