identifying and fixing server I/O slowdowns
Jeff Kramer
jeffk at well.com
Fri Aug 6 00:17:04 PDT 2004
Oh great and wise FreeBSD gurus,
I've been running FreeBSD boxes for about five years with great
results (up to 6 at the moment), but recently one of my machines has
started to seriously act up. Every time a heavy disk operation (say,
tar'ing a 1 gig directory) occurs the system slows to a crawl, and
requests to apache/php/mysql sites hosted on it just hang.
The system is a dual p3 1.13ghz box with a gig of ram and mirrored 80
gig WD800BB drives on a Promise TX2 controller. The raid isn't
degraded. There's a dedicated 1.5 gig swap partition and a swap file
on the /usr partition. We had some apache processes go nuts one
time, which is why I added the swap file.
We run about 15 jails on the machine, with MySQL in the server proper
and apache/php running inside the jails. I initially thought it was
a rogue process taking down the machine, but it seems to be that any
heavy disk activity for more than a few minutes brings about the
slowdown. It doesn't happen instantly, but after a minute or two
things will slow to a crawl.
I've recompiled the kernel a few times, upgraded to the latest
4-STABLE rev, and even turned on device polling, but nothing seems to
be helping. It doesn't seem to happen on another machine we have
with identical hardware.
My sysctl.conf:
kern.ipc.somaxconn=4096
net.inet.tcp.sendspace=32768
net.inet.tcp.recvspace=32768
net.inet.icmp.drop_redirect=1
net.inet.icmp.log_redirect=1
net.inet.ip.redirect=0
net.inet6.ip6.redirect=0
net.link.ether.inet.max_age=1200
net.inet.icmp.bmcastecho=0
net.inet.icmp.maskrepl=0
kern.maxfiles=65536
kern.ipc.shm_use_phys=1
kern.polling.enable=1
And a netstat -m:
301/928/131072 mbufs in use (current/peak/max):
301 mbufs allocated to data
287/874/32768 mbuf clusters in use (current/peak/max)
1980 Kbytes allocated to network (2% of mb_map in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines
And here's a typical systat -v snapshot while the machine's 'ok':
3 users Load 0.32 0.38 0.31 Aug 6 00:03
Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER
Tot Share Tot Share Free in out in out
Act 221588 38656 747652 117796 39404 count 4 3
All 1024156 41620 1546136 144132 pages 18 5
Interrupts
Proc:r p d s w Csw Trp Sys Int Sof Flt 21 cow 1156 total
2 2 70 343 63322119 1156 57 397 186992 wire fxp0 irq2
623848 act 13
ohci0 irq9
4.4%Sys 1.0%Intr 2.5%User 0.0%Nice 92.1%Idl 176096 inact 11 mux irq10
| | | | | | | | | | 37220 cache fdc0 irq6
==+> 2184 free 1004 clk irq0
daefr 128 rtc irq8
Namei Name-cache Dir-cache 15 prcfr
Calls hits % hits % 5 react
126 125 99 pdwake
340 zfod pdpgs
Disks ad4 ad6 fd0 md0 119 ofod 1 intrn
KB/t 0.00 16.72 0.00 0.00 34 %slo-z 114304 buf
tps 0 11 0 0 401 tfree 173 dirtybuf
MB/s 0.00 0.17 0.00 0.00 70310 desiredvnodes
% busy 0 9 0 0 64089 numvnodes
54829 freevnodes
And here's a systat -v snapshop while the machine's choking:
4 users Load 0.39 0.35 0.31 Aug 6 00:08
Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER
Tot Share Tot Share Free in out in out
Act 191344 34248 728736 117268 51916 count 1 6
All 1024676 37500 2075520 144188 pages 2 67
Interrupts
Proc:r p d s w Csw Trp Sys Int Sof Flt 29 cow 1698 total
5 2 70 573 74423171 1699 225 367 180904 wire fxp0 irq2
640404 act 335
ohci0 irq9
5.7%Sys 1.9%Intr 7.5%User 0.0%Nice 84.9%Idl 153116 inact 236 mux irq10
| | | | | | | | | | 50252 cache fdc0 irq6
===+>>>> 1664 free 999 clk irq0
daefr 128 rtc irq8
Namei Name-cache Dir-cache 93 prcfr
Calls hits % hits % 1 react
8693 8196 94 12 0 pdwake
308 zfod 2693 pdpgs
Disks ad4 ad6 fd0 md0 135 ofod intrn
KB/t 98.81 16.61 0.00 0.00 43 %slo-z 114304 buf
tps 13 225 0 0 1277 tfree 278 dirtybuf
MB/s 1.23 3.64 0.00 0.00 70310 desiredvnodes
% busy 2 99 0 0 64089 numvnodes
52125 freevnodes
Thoughts? Is there any way to force a machine to limit the
monopolization of a disk controller by a process?
--
Jeff Kramer
jeffk at well.com
http://www.keika.org/
More information about the freebsd-questions
mailing list