Throughput problems with dummynet and high delays

Andy Jones andjones at gmail.com
Tue Oct 31 00:21:09 UTC 2006


  Hi,

I'm a researcher at the University of North Carolina trying to simulate
certain link characteristics using dummynet pipes in ipfw. Our end goal is
to thoroughly test high speed TCP variants in our experimental network in a
wide range of situations (which includes varying the delay from 1ms to
200ms).

I have two Dell PowerEdge 2850 servers connected to each other using an
Intel Gigabit ethernet card (although I'm not sure of the exact model). They
both run FreeBSD 6.0. I'm using iperf to push as many bits through the wire
as possible.

Without dummynet, sustained throughput is as expected, close to 1Gbps
[  3]  0.0-180.0 sec  19.2 GBytes    918 Mbits/sec

When dummynet is used to add delay (100ms in my case) to the network, the
machines have problems sustaining high throughput.

Here are the setup on the receiver end
% sysctl kern.ipc.maxsockbuf=16M
% sysctl net.inet.tcp.recvspace=12MB
% iperf -s

and on the sender end
% sysctl kern.ipc.maxsockbuf=16M
% sysctl net.inet.tcp.sendspace=12MB
% ipfw pipe 1 config delay 100
% ipfw add 10 pipe 1 ip from any to any out
% iperf -c <machine> [args ...]

kern.ipc.nmbclusters has also been tuned to 65536 at boot time. Our kernel
is also has HZ=1000. The ipfw rule is added such that it is the first rule
in the chain. 12MB is about the right size send buffer for the
bandwidth-delay product (1Gbps * 0.1 RTT / 8bits/byte). We're also using an
MTU of roughly 9000 bytes.

What happens is as the TCP window grows larger (about 3-4MB), the sender
spends most of its time processing interrupts (80-90% as reported by top)
and throughput peaks at about 300Mbps. I've dug into the dummynet code and
I've found that a large amount of time is spent in the routine
transmit_event(struct dn_pipe *p) which dequeues packets from a pipe and
calls ip_output. It appears that ip_output is the culprit, but what it is
doing with its time, I'm not sure. Packet drops are not being lost according
to TCP and dummynet. I suspect either pfil_run_hooks(...) or (*
ifp->if_output) (...) calls in ip_output are taking too much time, but I'm
not sure.

Any suggestions on what could be happening would be appreciated!

  -Andy Jones


More information about the freebsd-net mailing list