running out of mbufs?

Wed Aug 3 03:49:36 GMT 2005

Greetings, 

I'm trying to do some performance testing of a content filtering system, so 
I'm trying to get very high HTTP throughput.  I've got 4 * HP DL380s with 
3.4G Xeon processors (hyper threading) and 1 G RAM, 2 onboard BGEs, and 2 * 
2 port EM.  Using FreeBSD5.4-stable (as of 2005/08/02) and device polling, 
I've configured a large number (246) VLAN interfaces on two machines, and 
have apache on one box and siege on the other.  Using 'siege -f 
/home/my_big_list_of_urls -c 50 --internet' one host does a large number of 
request from the other machine.  I've been trying to tune for maximum 
performance and have been using lots of examples for /etc/sysctl.conf and so 
on from the web.  Adjusting these settings and running the siege, I've found 
the apache server completely loses network connectivity when device polling 
is enabled.  I've adjusted the HZ lots and found the system survives the 
longest set a 15000 (yes it seems very large doesn't it).  The problem now 
seems to be that I'm running out of mbufs: 

 --------------------------------------
4294264419 mbufs in use
4294866740/2147483647 mbuf clusters in use (current/max)
0/3/6656 sfbufs in use (current/peak/max)
3817472 KBytes allocated to network
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines
 --------------------------------------
host228# cat kern.polling
kern.polling.burst: 671
kern.polling.each_burst: 100
kern.polling.burst_max: 1000
kern.polling.idle_poll: 0
kern.polling.poll_in_trap: 0
kern.polling.user_frac: 70
kern.polling.reg_frac: 40
kern.polling.short_ticks: 3523
kern.polling.lost_polls: 49996588
kern.polling.pending_polls: 1
kern.polling.residual_burst: 0
kern.polling.handlers: 2
kern.polling.enable: 1
kern.polling.phase: 0
kern.polling.suspect: 1768262
kern.polling.stalled: 9
kern.polling.idlepoll_sleeping: 1
 ------------------------------------- 

For some reason, the 'current' can be WAAAY higher than the 'max' which 
seems very odd.  I've tried putting the 'max' right up to 5 billion, however 
it only goes to 2.1 billion. 

How should I proceed further?
How come the box loses all connectivity, rather than just some TCP streams 
failing?  Why doesn't the network recover when I stop the siege?
Why does kern.polling.burst_max only go to 1000 when I try setting it to 
1500? 

Settings:
 ----------------------------------------------------------
host228# sysctl kern.polling
kern.polling.burst: 684
kern.polling.each_burst: 100
kern.polling.burst_max: 1000
kern.polling.idle_poll: 0
kern.polling.poll_in_trap: 0
kern.polling.user_frac: 70
kern.polling.reg_frac: 40
kern.polling.short_ticks: 97
kern.polling.lost_polls: 8390
kern.polling.pending_polls: 0
kern.polling.residual_burst: 0
kern.polling.handlers: 2
kern.polling.enable: 1
kern.polling.phase: 0
kern.polling.suspect: 3642
kern.polling.stalled: 0
kern.polling.idlepoll_sleeping: 1
 ------------------------------------------------------------
host228# cat /etc/sysctl.conf
#kern.polling.enable=1
kern.polling.enable=1 

#kern.polling.user_frac: 50
#kern.polling.reg_frac: 20
kern.polling.user_frac=70
kern.polling.reg_frac=40 

#kern.polling.burst: 5
#kern.polling.each_burst: 5
#kern.polling.burst_max: 150  #default for 100MB/s 

kern.polling.burst=1000
kern.polling.each_burst=100
kern.polling.burst_max=2000 

#example I found on the web
#kern.polling.burst: 1000
#kern.polling.each_burst: 80
#kern.polling.burst_max: 1000 

#net.inet.tcp.sendspace: 32768
#net.inet.tcp.recvspace: 65536
net.inet.tcp.sendspace=1024000
net.inet.tcp.recvspace=1024000 

#sysctl net.inet.tcp.rfc1323=1  Activate window scaling and timestamp 
options according to RFC 1323.
net.inet.tcp.rfc1323=1
net.inet.tcp.delayed_ack=0 

#kern.ipc.maxsockbuf: 262144
kern.ipc.maxsockbuf=20480000 

#The kern.ipc.somaxconn sysctl variable limits the size of the listen queue 
for accepting new TCP connections. The default value of 128 is typically too 
low for robust handling of new connections in a heavily loaded web server 
environment.
#kern.ipc.somaxconn: 128
kern.ipc.somaxconn=1024 

#The TCP Bandwidth Delay Product Limiting is similar to TCP/Vegas in NetBSD. 
It can be enabled by setting net.inet.tcp.inflight.enable sysctl variable to 
1. The system will attempt to calculate the bandwidth delay product for each 
connection and limit the amount of data queued to the network to just the 
amount required to maintain optimum throughput.
#This feature is useful if you are serving data over modems, Gigabit 
Ethernet, or even high speed WAN links (or any other link with a high 
bandwidth delay product), especially if you are also using window scaling or 
have configured a large send window. If you enable this option, you should 
also be sure to set net.inet.tcp.inflight.debug to 0 (disable debugging), 
and for production use setting net.inet.tcp.inflight.min to at least 6144 
may be beneficial. 

#these are the defaults
#net.inet.tcp.inflight.enable: 1
#net.inet.tcp.inflight.debug: 0
#net.inet.tcp.inflight.min: 6144
#net.inet.tcp.inflight.max: 1073725440
#net.inet.tcp.inflight.stab: 20 

#Disable entropy harvesting for ethernet devices and interrupts.  There are 
optimizations present in 6.x that have not yet been backported that improve 
the overhead of entropy harvesting, but you can get the same benefits by 
disabling it.  In your environment, it's likely not needed. I hope to 
backport these changes in a couple of weeks to 5-STABLE.
kern.random.sys.harvest.ethernet=0
kern.random.sys.harvest.interrupt=0
 --------------------------------------------------
host228# sysctl -a | grep ipc | grep nm
kern.ipc.nmbclusters: 25600
host228# sysctl kern.ipc.nmbclusters=5000000000
kern.ipc.nmbclusters: 25600 -> 2147483647
host228# sysctl -a | grep ipc | grep nm
kern.ipc.nmbclusters: 2147483647
 -------------------------------------------------
host228# sysctl -a | grep hz
kern.clockrate: { hz = 15000, tick = 66, profhz = 1024, stathz = 128 }
debug.psmhz: 20
 --------------------------------------------------
THE PHYSCIAL INTERFACES ONLY (I'm only using 1 interface per 2 port card, 
and only running performance tests on the em cards)
bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
       options=1a<TXCSUM,VLAN_MTU,VLAN_HWTAGGING>
       inet 192.168.1.228 netmask 0xffffff00 broadcast 192.168.1.255
       ether 00:12:79:cf:d0:bf
       media: Ethernet autoselect (1000baseTX <full-duplex>)
       status: active
bge1: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
       options=1a<TXCSUM,VLAN_MTU,VLAN_HWTAGGING>
       ether 00:12:79:cf:d0:be
       media: Ethernet autoselect (none)
       status: no carrier
em0: flags=18843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,POLLING> mtu 1500
       options=4b<RXCSUM,TXCSUM,VLAN_MTU,POLLING>
       ether 00:11:0a:56:ab:3a
       media: Ethernet autoselect (1000baseTX <full-duplex>)
       status: active
em1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
       options=4b<RXCSUM,TXCSUM,VLAN_MTU,POLLING>
       ether 00:11:0a:56:ab:3b
       media: Ethernet autoselect
       status: no carrier
em2: flags=18843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,POLLING> mtu 1500
       options=4b<RXCSUM,TXCSUM,VLAN_MTU,POLLING>
       ether 00:11:0a:56:b2:4c
       media: Ethernet autoselect (1000baseTX <full-duplex>)
       status: active
em3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
       options=4b<RXCSUM,TXCSUM,VLAN_MTU,POLLING>
       ether 00:11:0a:56:b2:4d
       media: Ethernet autoselect
       status: no carrier
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
       inet 127.0.0.1 netmask 0xff000000
 --------------------------------------- 

Regards,
Dave Seddon
das-keyword-net.6770cb at seddon.ca