Synopsis: process swi1: net, taskq em0 and dummynet gives 100% CPU
usage
Sergey Pronin
sepron at gmail.com
Mon Mar 16 08:41:01 PDT 2009
Synopsis: process swi1: net, taskq em0 and dummynet gives 100% CPU usage
Related to
http://lists.freebsd.org/pipermail/freebsd-net/2009-February/021120.html
Not depending on the conditions (no heavy load, not a lot of traffic passing
through, not a lot of ng nodes) server stops to work properly.
A:
1) swi1:net gives me 100% CPU usage.
2) server is not responding to icmp echo requests
3) ssh of course not working
4) mpd has an "ngsock" state at the top
5) rebooting the server helps.
B:
1) taskq: em0 gives me 100% CPU usage.
2) I have watchdog timeout in my /var/log/messages
3) server is not responding to icmp echo requests
4) ssh of course not working
5) mpd has an "ngsock" state at the top
6) rebooting the server helps.
7) swi1:net is 0%
C:
1) dummynet process gives 100% CPU usage.
2) server is not responding to icmp echo requests
3) ssh of course not working
4) mpd has an "ngsock" state at the top
5) rebooting the server helps.
I have few servers:
INTEL S3200SH with Q8200 or E8600
NICs: 82566DM-2 or 82571EB (em driver)
OSes: FreeBSD 7.0-RELEASE-p10, FreeBSD 7.0-RELEASE-p9, FreeBSD
6.4-RELEASE-p3
Soft: mpd 4.4.1, ipfw with dummynet shaping, pf (nat only)
PPPoE
I'm using only em0 card with about 550 vlans
2000 ng nodes created
About 500-700 simultaneous PPPoE sessions in a rush hour.
kernel:
device bpf # Berkeley packet filter
device pf
options IPFIREWALL
options IPFIREWALL_VERBOSE
options IPFIREWALL_FORWARD
options IPFIREWALL_VERBOSE_LIMIT=1000
options IPFIREWALL_DEFAULT_TO_ACCEPT
options IPDIVERT
options DUMMYNET
options DEVICE_POLLING
options HZ=2000
options NETGRAPH
options NETGRAPH_ETHER
options NETGRAPH_IFACE
options NETGRAPH_SOCKET
options NETGRAPH_PPP
options NETGRAPH_TCPMSS
options NETGRAPH_TEE
options NETGRAPH_VJC
options NETGRAPH_PPPOE
On some servers i have netgraph as modules and polling option commented out.
sysctl.conf:
net.inet.ip.intr_queue_maxlen=1000
net.inet.tcp.blackhole=2
net.inet.udp.blackhole=1
net.inet.ip.dummynet.hash_size=1024
net.inet.ip.dummynet.io_fast=1
net.inet.ip.fw.one_pass=1
net.inet.ip.fastforwarding=1
net.isr.direct=0
#net.inet.ip.portrange.randomized=0
net.inet.tcp.syncookies=1
kern.ipc.maxsockbuf=1048576
net.graph.maxdgram=524288
net.graph.recvspace=524288
net.inet.ip.portrange.first=1024
net.inet.ip.portrange.last=65535
dev.em.0.rx_int_delay=160
dev.em.0.rx_abs_int_delay=160
dev.em.0.tx_int_delay=160
dev.em.0.tx_abs_int_delay=160
dev.em.0.rx_processing_limit=200
loader.conf:
autoboot_delay="2"
kern.ipc.maxpipekva=10000000
net.graph.maxalloc=2048
hw.em.rxd="512"
hw.em.txd="1024"
About 30 ipfw rules and 2 rules for shaping:
00300 pipe tablearg ip from any to table(4) out via ng*
00301 pipe tablearg ip from table(5) to any in via ng*
I have tested different network cards with different chipsets.
With and without lagg0.
I had the same problems with Freebsd 7.1-RELEASE-p1/p2.
I tried to start servers without em tuning in loader.conf and sysctl.conf.
Server uptime differs from one week to two month.
I have two another servers with the same hardware, but without using
dummynet, netgraph and mpd. There is only quagga + bgp, same chipsets,
FreeBSD 7.0-RELEAS-p10. No problems at all.
IMHO: problem is somewhere in netgraph. Something is causing an infinite
loop.
Any ideas?
More information about the freebsd-net
mailing list