dummynet, em driver, device polling issues :-((
toasty at dragondata.com
Wed Oct 5 05:51:57 PDT 2005
On Oct 5, 2005, at 7:21 AM, Ferdinand Goldmann wrote:
>> In one case, we had a system acting as a router. It was a Dell
>> PowerEdge 2650, with two dual "server" adapters. each were on
>> separate PCI busses. 3 were "lan" links, and one was a "wan" link.
>> The lan links were receiving about 300mbps each, all going out the
>> "wan" link at near 900mbps at peak. We were never able to get
>> above 944mbps, but I never cared enough to figure out where the
>> bottleneck was there.
> 944mbps is a very good value, anyway. What we see in our setup are
> throuput rates around 300mbps or below. When testing with tcpspray,
> throughput hardly exceeded 13MB/s.
> Are you running vlans on your interface? Our em0-card connects
> several sites together, which are all sitting on separate vlan
> interfaces for which the em0 acts as parent interface.
Two of the interfaces had vlans, two didn't.
>> This was with PCI-X, and a pretty stripped config on the server side.
> Maybe this makes a difference, too. We only have a quite old
> xSeries 330 with PCI and a 1.2GHz CPU.
I think that's a really important key. If you're running a "normal"
32 bit 33MHz PCI bus, the math just doesn't work for high speeds. The
entire bandwidth of the bus is just a tad over 1gbps. Assuming 100%
efficiency (you receive a packet, then turn around and resend it
immediately) you'll only be able to reach 500mbps. When you add in
the overhead of each PCI transaction, the fact that the CPU can't
instantly turn around and send data out the same cycle that the last
packet was finished being received, and other inefficiencies you will
probably only see something in the 250-300mbps range at MOST, if that.
I believe the xSeries 330 uses 64 bit 33MHz slots though. That gives
you double the bandwidth to play with. But, I'm still not convinced
that the CPU isn't the bottleneck there. If you know you're running
64/33 in the slot you have the card in, I'd be willing to say you
could do 500mbps or so at peak. A bunch of IPFW rules, the CPU just
not being able to keep up, other activity on the system, or a complex
routing table will reduce that.
Just to sum up:
A 64/33MHz bus has the theoretical speed of 2gbps. If you're
forwarding packets in one interface then out another, you have to cut
that in half. PCI is half duplex, you can't receive and send at the
same time. This leaves 1gbps left. PCI itself isn't 100% efficient.
You burn cycles setting up each PCI transaction. When the card
busmasters to dump the packet into RAM, it frequently will have to
wait for the memory controller to proceed. The ethernet card itself
requires some PCI bandwidth to operate - the kernel needs to check
its registers, the card has to update pointers in ram for the
busmaster circular buffer, etc. All those things take time on the PCI
bus, leaving maybe 750-800mbps left for actual data.
The rest of the system isn't 100% efficient either. The CPU/kernel/
etc can't immediately turn around a packet to send out the instant
it's received, further lowering your overall bandwidth limit.
I've done a lot of work on custom ethernet interfaces both in FreeBSD
and in custom embedded OS projects. The safe bet is to assume that
you can route/forward 250mbps on 32/33 and 500mbps on 64/33 if you
have enough CPU efficiency to fill the bus.
>> Nothing fancy on polling, i think we set HZ to 10000
> Ten-thousand? Or is this a typo, and did you mean thousand?
> This is weird. :-( Please, is there any good documentation on
> tuning device polling? The man page does not give any useful
> pointers about values to use for Gbit cards. I have already read
> things about people using 2000, 4000HZ ... Gaaah!
> I tried with 1000 and 2000 so far, without good results. It seems
> like everybody makes wild assumptions on what values to use for
We arrived at 10000 by experimentation. A large number of interfaces,
a ton of traffic... I'm not sure the complete reasons why it helped,
but it did.
>> , turned on idle_poll, and set user_frac to 10 because we had some
>> cpu hungry tasks that were not a high priority.
> I think I red somewhere about problems with idle_poll. How high is
> your burst_max value? Are you seeing a lot of ierrs?
No ierrs at all, and we never touched burst_max.
In the end, if you're getting "Receive No Buffers" incrementing, that
basically means what it implies. The ethernet chip received a packet
and was out of room to store it because the CPU hadn't dumped the
receive buffers from previous packets yet. Either the CPU is too busy
and can't keep up, the PCI bus is being saturated and the ethernet
chip can't move packets out of its tiny internal memory fast enough,
or there is some polling problem that's being hit here.
If I had to bet, what I think is happening is that you've got a
bottleneck somewhere (ipfw rules, not enough CPU, too much PCI
activity, or you're in a 32/33 PCI slot) and it can't keep up with
what you're doing. Turning polling on is exposing different symptoms
of this than having polling off. Polling may be increasing your
overall speed enough that instead of having packets getting backed up
in the kernel and stopping you from going higher than XXmbps, with
polling you're getting to XXXmbps and seeing a new symptom of a
> Forgot to ask - do you have fastforwarding enabled in your sysctl?
No. But we were running either 2.8 or 3.2GHz P4 Xeons, so we had the
CPU to burn.
More information about the freebsd-net