em0, polling performance, P4 2.8ghz FSB 800mhz

Sat Feb 28 20:56:29 PST 2004

Don Bowman wrote:

>>It was kindly pointed out that I didn't including the symptoms of the 
>>problem:
>>
>>
>>Without polling on, I get 70+% interrupt load, and I get live lock.
>>
>>With polling on, I start getting huge amounts of input errors, packet 
>>loss, and general unresponsiveness to the network. The web 
>>server on it 
>>doesn't respond though it occassionally will open the 
>>connection, just 
>>not respond. accept_filter on/off makes no difference. I have 
>>read other 
>>posts that say em systems can more >200kpps without serious incident.
>>
>>Thanks in advance,
>>
>>DJ
> 
> 
> You may need to increase the MAX_RXD inside your em driver to e.g. 512.

I didn't know if my card had a buffer bigger than the default 256. I can 
increase it, but I didn't know how to determine how big a MAX_RXD my 
card would support. When the system was under load, it was generating 
2xHZ clock ticks (2000 when HZ was 1000) is that normal?

> With similar system, em can handle ~800Kpps of bridging.

What settings did you use?

> Your earlier email showed a very large number of RST messages,
> which makes me suspect the blackhole actually wasn't enabled.
> 
> Not exactly sure what you're trying to do here. It sounds like
> you are trying to generate a SYN flood on port 80, and your listen
> queue is backing up. You've increased kern.ipc.somaxconn? Does your
> application specify a fixed listen queue depth? Could it be increased?
> Are you using apache as the server? Could you use a kqueue-enabled
> one like thttpd?

Using apache, might go to squid or thttpd. Didn't think it should make a 
big deal. Increased somaxconn. Basically the system is getting hammered 
(after all filtering at the router) with valid get requests on port 80.

> Have you checked net.inet.ip.intr_queue_drops?
> If its showing >0, check net.inet.ip.intr_queue_maxlen, perhaps
> increase it.

net.inet.ip.intr_queue_maxlen: 500
net.inet.ip.intr_queue_drops: 0
p1003_1b.sigqueue_max: 0

No intr drops.

> 
> Have you sufficient mbufs and clusters? netstat -m.
> 

1026/5504/262144 mbufs in use (current/peak/max):
         1026 mbufs allocated to data
1024/5460/65536 mbuf clusters in use (current/peak/max)
12296 Kbytes allocated to network (6% of mb_map in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines

mbufs look fine.

> If you want to spend more time in kernel, perhaps change
> kern.polling.user_frac to 10?

I'll do that.
> 
> I might have HZ @ 2500 as well.
> 
> You could use ipfw to limit the damage of a syn flood, e.g.
> a keep-state rule with a limit of ~2-5 per source IP, lower the
> timeouts, increase the hash buckets in ipfw, etc. This would
> use a mask on src-ip of all bits.
> something like:
> allow tcp from any to any setup limit src-addr 2

This is a great idea. We were trapping those who crossed our connection 
thresholds and blackholing them upstream (automatically, with a script).

> 
> this would only allow 2 concurrent TCP sessions per unique
> source address. Depends on the syn flood you are expecting
> to experience. You could also use dummynet to shape syn
> traffic to a fixed level i suppose.
> 
> now... this will switch the DoS condition to elsewhere in
> the kernel, and it might not win you anything.
> net.inet.ip.fw.dyn_buckets=16384
> net.inet.ip.fw.dyn_syn_lifetime=5
> net.inet.ip.fw.dyn_max=32000
> 
> might be called for if you try that approach.
> 

I see where that should get us. We'll see.

Thanks!

DJ