Big Problem, Load Avg Very High
Steven Adams
steve at drifthost.com
Fri Nov 5 22:23:36 PST 2004
Hi Dan,
I disabled one of my clients web pages ( the one that gets the most hits
)and its gone down to 130MB swap used and staying there. The clients site is
just a small site with one page that shows one gallery but gets a lot of
hits...
It now has 140MB free.. but It seems as if it slowly drops then comes back
then drops again..
The weird thing is the clients site has been up for months and is getting
the same hits, the load wasn't this bad before but was still around 1-2..
I used to run all the same thing on a different server with 1 cpu and 512MB
ram on slackware linux and it ran fine..
That's why its confussing me.. ill give systat a go.
I was going to do a make buildworld and update to freebsd 5.3 and see if
that fixes it..
But Im not to sure if I should, ive been told it getting released soon so
maybe its ok to update..?
Heres a ps aux output
============================================
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
root 11 98.1 0.0 0 12 ?? RL Fri01PM 1627:28.24 (idle:
cpu3)
root 12 97.3 0.0 0 12 ?? RL Fri01PM 1614:05.04 (idle:
cpu2)
root 13 95.2 0.0 0 12 ?? RL Fri01PM 1571:36.41 (idle:
cpu1)
root 14 89.6 0.0 0 12 ?? RL Fri01PM 1488:33.38 (idle:
cpu0)
mysql 56065 1.3 3.1 74012 32480 ?? S 4:43PM 0:24.23
/usr/local/libexec/mysqld --basedir=/usr/local --datadir=/va
nobody 60870 1.1 1.3 18644 13592 ?? S 5:07PM 0:04.83
/usr/local/apache/bin/httpd -DSSL
jeneaux 62637 1.1 0.3 6192 2636 ?? S 5:20PM 0:00.03 cppop -
serving 203.122.211.211 - TRANSACTION - jeneaux (cpp
root 29 0.0 0.0 0 12 ?? WL Fri01PM 5:19.11 (irq16:
bge0)
root 37 0.0 0.0 0 12 ?? WL Fri01PM 1:20.92 (irq24:
amr0)
root 62 0.0 0.0 0 12 ?? WL Fri01PM 5:23.31 (swi1: net)
root 63 0.0 0.0 0 12 ?? WL Fri01PM 3:49.56 (swi8:
clock)
root 2 0.0 0.0 0 12 ?? DL Fri01PM 0:08.92 (g_event)
root 3 0.0 0.0 0 12 ?? DL Fri01PM 2:38.58 (g_up)
root 4 0.0 0.0 0 12 ?? DL Fri01PM 3:33.41 (g_down)
root 65 0.0 0.0 0 12 ?? DL Fri01PM 0:48.94 (random)
root 66 0.0 0.0 0 12 ?? WL Fri01PM 1:04.29 (swi6:+)
root 5 0.0 0.0 0 12 ?? DL Fri01PM 0:00.00 (taskqueue)
root 68 0.0 0.0 0 12 ?? WL Fri01PM 0:00.00 (swi7:
acpitaskq)
root 70 0.0 0.0 0 12 ?? WL Fri01PM 0:00.00 (swi3:
cambio)
root 71 0.0 0.0 0 12 ?? WL Fri01PM 0:00.00 (swi7: task
queue)
root 6 0.0 0.0 0 12 ?? IL Fri01PM 0:00.00
(acpi_task0)
root 7 0.0 0.0 0 12 ?? IL Fri01PM 0:00.00
(acpi_task1)
root 8 0.0 0.0 0 12 ?? IL Fri01PM 0:00.00
(acpi_task2)
root 9 0.0 0.0 0 12 ?? DL Fri01PM 45:03.28
(pagedaemon)
root 72 0.0 0.0 0 12 ?? DL Fri01PM 11:07.16 (vmdaemon)
root 73 0.0 0.0 0 12 ?? DL Fri01PM 4:20.71 (pagezero)
root 74 0.0 0.0 0 12 ?? DL Fri01PM 0:06.06 (bufdaemon)
root 75 0.0 0.0 0 12 ?? DL Fri01PM 1:40.70 (syncer)
root 76 0.0 0.0 0 12 ?? DL Fri01PM 0:01.80 (vnlru)
root 571 0.0 0.1 4348 1172 con- S Fri01PM 0:06.88 rrdtimer
(perl)
root 573 0.0 0.0 976 332 ?? Ss Fri01PM 0:16.94
/usr/sbin/MegaServ MegaCtrl
root 621 0.0 0.1 4892 1260 con- S Fri01PM 0:17.42 perl
./read-data.pl start part
root 636 0.0 0.3 6812 2692 con- S Fri01PM 1:31.68 perl
./read-data.pl start system
root 650 0.0 0.1 4868 1236 con- S Fri01PM 0:09.02 perl
./read-data.pl start traffic
root 668 0.0 0.0 1372 204 ?? Ss Fri01PM 0:00.72
/usr/sbin/cron
bind 691 0.0 0.8 13396 7848 ?? Ss Fri01PM 0:53.57
/usr/sbin/named -u bind -c /etc/named.conf
mailnull 707 0.0 0.0 5352 360 ?? Is Fri01PM 0:00.76
/usr/sbin/sendmail -bd -q30m (exim-4.42-0)
mailnull 711 0.0 0.0 5352 12 ?? Is Fri01PM 0:00.00
/usr/sbin/sendmail -tls-on-connect -bd -oX 465 (exim-4.42-0)
root 713 0.0 0.1 2412 1084 con- S Fri01PM 0:19.81 antirelayd
(perl)
root 727 0.0 0.0 19680 12 ?? Is Fri01PM 0:00.48
/usr/bin/spamd -d --allowed-ips=127.0.0.1 --pidfile=/var/run
root 734 0.0 1.5 23740 15168 ?? I Fri01PM 0:36.61 spamd child
(perl)
root 735 0.0 1.5 24080 15512 ?? I Fri01PM 0:36.18 spamd child
(perl)
root 736 0.0 1.5 23972 15416 ?? I Fri01PM 1:24.11 spamd child
(perl)
root 737 0.0 1.5 23776 15824 ?? I Fri01PM 0:58.46 spamd child
(perl)
root 738 0.0 1.4 25260 14804 ?? I Fri01PM 0:35.16 spamd child
(perl)
nobody 774 0.0 0.0 2352 240 ?? Ss Fri01PM 0:01.16 proftpd:
(accepting connections) (proftpd)
root 824 0.0 0.7 12916 7012 con- IN Fri01PM 0:54.40 cpanellogd -
sleeping for logs (perl)
root 837 0.0 0.1 6120 1280 con- S Fri01PM 0:04.44 cppop -
accepting on port 110 (cppop)
nobody 915 0.0 0.0 1164 12 con- I Fri01PM 0:00.00
/usr/local/cpanel/bin/startmelange (melange)
nobody 917 0.0 0.0 2820 12 con- I Fri01PM 0:00.00 entropychat
(perl)
root 1009 0.0 0.0 1276 12 v2 Is+ Fri01PM 0:00.00
/usr/libexec/getty Pc ttyv2
root 1010 0.0 0.0 1276 12 v3 Is+ Fri01PM 0:00.00
/usr/libexec/getty Pc ttyv3
root 1011 0.0 0.0 1276 12 v4 Is+ Fri01PM 0:00.00
/usr/libexec/getty Pc ttyv4
root 1012 0.0 0.0 1276 12 v5 Is+ Fri01PM 0:00.00
/usr/libexec/getty Pc ttyv5
root 1013 0.0 0.0 1276 12 v6 Is+ Fri01PM 0:00.00
/usr/libexec/getty Pc ttyv6
root 1014 0.0 0.0 1276 12 v7 Is+ Fri01PM 0:00.00
/usr/libexec/getty Pc ttyv7
root 1253 0.0 0.0 1416 12 ?? Is Fri01PM 0:00.12
/usr/sbin/inetd -wW
root 2775 0.0 0.0 3492 12 ?? Is Fri01PM 0:00.09
/usr/sbin/sshd
drift 20013 0.0 0.0 2060 504 ?? Is Fri03PM 0:01.07 imapd
root 85347 0.0 0.0 10808 12 ?? Is 10:40AM 0:00.00
/usr/sbin/clamd
root 85388 0.0 0.1 2312 1008 ?? S 10:40AM 0:02.86 antirelayd
(perl)
cpanel 10149 0.0 0.0 3020 12 ?? Is 10:56AM 0:00.00
/usr/bin/stunnel-4.04local /usr/local/cpanel/etc/stunnel/def
root 10168 0.0 0.1 7464 1204 ?? I 10:56AM 0:00.49 cpsrvd -
waiting for connections (cpsrvd)
mailman 18434 0.0 0.0 8036 12 ?? Is 12:03PM 0:00.02
/usr/bin/python2 /usr/local/cpanel/3rdparty/mailman/bin/mail
mailman 18435 0.0 0.1 7996 1280 ?? S 12:03PM 0:05.33
/usr/bin/python2 /usr/local/cpanel/3rdparty/mailman/bin/qrun
mailman 18436 0.0 0.1 8000 1304 ?? S 12:03PM 0:05.65
/usr/bin/python2 /usr/local/cpanel/3rdparty/mailman/bin/qrun
mailman 18437 0.0 0.1 8000 1252 ?? S 12:03PM 0:05.47
/usr/bin/python2 /usr/local/cpanel/3rdparty/mailman/bin/qrun
mailman 18438 0.0 0.1 7996 1276 ?? S 12:03PM 0:05.36
/usr/bin/python2 /usr/local/cpanel/3rdparty/mailman/bin/qrun
mailman 18439 0.0 0.1 8008 1272 ?? S 12:03PM 0:05.44
/usr/bin/python2 /usr/local/cpanel/3rdparty/mailman/bin/qrun
mailman 18440 0.0 0.1 8040 1300 ?? S 12:03PM 0:05.84
/usr/bin/python2 /usr/local/cpanel/3rdparty/mailman/bin/qrun
mailman 18441 0.0 0.1 7996 1268 ?? S 12:03PM 0:05.39
/usr/bin/python2 /usr/local/cpanel/3rdparty/mailman/bin/qrun
mailman 18442 0.0 0.1 7996 872 ?? I 12:03PM 0:00.43
/usr/bin/python2 /usr/local/cpanel/3rdparty/mailman/bin/qrun
root 18450 0.0 0.0 15364 12 ?? Is 12:03PM 0:00.01
/usr/local/cpanel/whostmgr/bin/whostmgr2 ./dotweaksettings
root 18490 0.0 0.6 8772 6596 ?? IN 12:04PM 0:03.72 cpanellogd -
sleeping for logs (perl)
root 18492 0.0 0.5 11644 5580 ?? IN 12:04PM 0:05.24 cpanellogd -
sleeping for logs (perl)
root 18501 0.0 0.1 6124 580 ?? S 12:04PM 0:00.25 cppop -
accepting on port 110 (cppop)
root 18503 0.0 0.1 6124 580 ?? S 12:04PM 0:00.25 cppop -
accepting on port 110 (cppop)
root 18504 0.0 0.1 7480 568 ?? S 12:04PM 0:00.98 cpsrvd -
waiting for connections (cpsrvd)
root 27335 0.0 0.0 1316 296 ?? Ss 1:08PM 0:02.07 syslogd
root 44669 0.0 0.0 1632 12 v1 Is 3:32PM 0:00.03 login [pam]
(login)
root 44670 0.0 0.0 1276 12 v0 Is+ 3:32PM 0:00.00
/usr/libexec/getty Pc ttyv0
root 45124 0.0 0.0 1900 12 v1 I+ 3:36PM 0:00.05 -bash (bash)
root 49672 0.0 0.0 6228 12 ?? Is 4:07PM 0:00.07 sshd: steve
[priv] (sshd)
steve 49675 0.0 0.1 6268 656 ?? I 4:07PM 0:00.72 sshd:
steve at ttyp0 (sshd)
steve 49676 0.0 0.0 1900 12 p0 Is 4:07PM 0:00.02 -bash (bash)
root 49697 0.0 0.0 1644 12 p0 I 4:07PM 0:00.02 su -
root 49701 0.0 0.1 1900 540 p0 I+ 4:07PM 0:00.05 -su (bash)
root 51414 0.0 0.0 6228 12 ?? Is 4:19PM 0:00.08 sshd: steve
[priv] (sshd)
steve 51462 0.0 0.1 6268 688 ?? S 4:19PM 0:01.20 sshd:
steve at ttyp1 (sshd)
steve 51467 0.0 0.0 1900 12 p1 Is 4:19PM 0:00.02 -bash (bash)
root 52125 0.0 0.0 1644 12 p1 I 4:27PM 0:00.01 su -
root 52145 0.0 0.1 1916 1168 p1 S 4:28PM 0:00.32 -su (bash)
root 89410 0.0 0.2 2248 1592 ?? I 4:51PM 0:00.01
postsuexecinstall - searching for suexec problems (1572 min
root 56341 0.0 0.2 2248 1592 ?? S 4:59PM 0:00.01
postsuexecinstall - searching for suexec problems (1579 min
root 60838 0.0 0.8 13468 8112 ?? Ss 5:07PM 0:00.51
/usr/local/apache/bin/httpd -DSSL
nobody 60844 0.0 1.4 19224 14164 ?? S 5:07PM 0:04.34
/usr/local/apache/bin/httpd -DSSL
nobody 60845 0.0 1.2 18044 13008 ?? S 5:07PM 0:03.09
/usr/local/apache/bin/httpd -DSSL
nobody 60846 0.7 1.2 18024 12980 ?? S 5:07PM 0:01.82
/usr/local/apache/bin/httpd -DSSL
nobody 60847 0.0 1.2 18092 13040 ?? I 5:07PM 0:01.82
/usr/local/apache/bin/httpd -DSSL
nobody 60848 0.0 1.2 17984 12932 ?? I 5:07PM 0:02.43
/usr/local/apache/bin/httpd -DSSL
nobody 60849 0.0 1.4 19596 14584 ?? I 5:07PM 0:03.44
/usr/local/apache/bin/httpd -DSSL
nobody 60865 0.7 1.2 18024 12984 ?? S 5:07PM 0:03.29
/usr/local/apache/bin/httpd -DSSL
nobody 60866 0.0 1.2 18008 12992 ?? S 5:07PM 0:04.36
/usr/local/apache/bin/httpd -DSSL
nobody 60868 0.6 1.2 18004 12968 ?? S 5:07PM 0:04.17
/usr/local/apache/bin/httpd -DSSL
nobody 60869 0.7 1.3 18456 13444 ?? S 5:07PM 0:04.17
/usr/local/apache/bin/httpd -DSSL
nobody 60871 0.0 1.2 18020 12976 ?? I 5:07PM 0:02.60
/usr/local/apache/bin/httpd -DSSL
nobody 61115 0.4 1.2 17976 12932 ?? S 5:09PM 0:01.48
/usr/local/apache/bin/httpd -DSSL
nobody 61518 0.1 1.2 17512 12500 ?? S 5:12PM 0:00.72
/usr/local/apache/bin/httpd -DSSL
nobody 61957 0.0 1.2 18036 12980 ?? S 5:15PM 0:01.20
/usr/local/apache/bin/httpd -DSSL
nobody 61975 0.5 1.2 17652 12636 ?? S 5:15PM 0:01.56
/usr/local/apache/bin/httpd -DSSL
nobody 61976 0.0 1.2 17812 12800 ?? S 5:15PM 0:01.19
/usr/local/apache/bin/httpd -DSSL
nobody 62128 0.0 0.8 13516 8184 ?? I 5:17PM 0:00.01
/usr/local/apache/bin/httpd -DSSL
nobody 62350 0.2 1.4 19260 14208 ?? S 5:17PM 0:01.90
/usr/local/apache/bin/httpd -DSSL
nobody 62461 0.8 1.2 17916 12800 ?? S 5:18PM 0:00.66
/usr/local/apache/bin/httpd -DSSL
root 0 0.0 0.0 0 0 ?? ZW - 0:00.00 (pstat)
root 62655 0.0 0.1 1432 808 p1 R+ 5:20PM 0:00.00 ps auxf
root 0 0.0 0.0 0 4 ?? DLs Fri01PM 0:01.83 (swapper)
root 10 0.0 0.0 0 12 ?? DL Fri01PM 0:00.00 (ktrace)
root 1 0.0 0.0 760 84 ?? ILs Fri01PM 0:00.86 /sbin/init
--
root 15 0.0 0.0 0 12 ?? WL Fri01PM 0:00.06 (irq1:
atkbd0)
root 19 0.0 0.0 0 12 ?? WL Fri01PM 0:00.00 (irq6:
fdc0)
root 28 0.0 0.0 0 12 ?? WL Fri01PM 0:01.70 (irq15:
ata1)
================================================
Steven Adams steve at drifthost.com
DriftNet Web Services http://www.drifthost.com
Home: +61 2 94274857
Fax: +61 2 94274857
Mobile +61 (0) 404 085644
-----Original Message-----
From: Dan Nelson [mailto:dnelson at allantgroup.com]
Sent: Saturday, 6 November 2004 5:08 PM
To: steve at drifthost.com
Cc: questions at freebsd.org
Subject: Re: Big Problem, Load Avg Very High
In the last episode (Nov 06), Steven Adams said:
> We host a couple or sites on this server (not very big sites). My
> server load is always around 0.90 - 3.40. Sometimes it will jump up
> to 10-15.
>
> At random it will jump up to 30-40 load and I wont even be able to
> get to the server, typing commands on the remote ip based kvm is VERY
> slow sometimes missing letters. As soon as im able to get top running
> it shows
>
> ====================================
>
> last pid: 52614; load averages: 6.82, 15.75, 15.18 up 1+03:07:12
16:32:22
>
> 462 processes: 1 running, 460 sleeping, 1 zombie
> CPU states: 0.0% user, 0.0% nice, 0.6% system, 0.6% interrupt, 98.7%
idle
>
> Mem: 615M Active, 68M Inact, 288M Wired, 29M Cache, 112M Buf, 1844K Free
> Swap: 1536M Total, 555M Used, 981M Free, 36% Inuse, 12K In
500MB of swap used? You might have a process that's allocating too
much memory and causing the rest of the processes to swap to disk. Try
keeping a top session running all the time so you can monitor swap
usage and see if you notice any processes taking more memory than they
should.
> I did notice once when running systat -vmstat the amr0(scsi raid) jumps up
> to 99% busy copying 2-3MB/s for a few moments then goes back down..
That could be either regular disk activity or swap thrashing. "vmstat
1" will tell you (watch the fre, pi and po columns).
> After one of the times it went to 50load it got this on the console
> screen. FYI: amrd0s1h is /home partition
>
> Swap_pager: indefinite wait buffer: device: amrd0s1h, blkno: 103776, size:
32768
> Swap_pager: indefinite wait buffer: device: amrd0s1h, blkno: 130801, size:
4096
Most likely you're thrashing. I've seen a couple other people mention
this error with 5.2.1, but not lately, so chances are 5.3 has fixed
this particular problem.
> It seems that its copying a lot of information to the swap drive and
> is running out of ram, which I don't know why it seems apache is
> taking up all of the ram for some weird reason?
Not weird at all. If you are using perl or php modules, they can
really suck up ram if you get a lot of page hits at once. You might
want to look at using fastcgi to separate perl/php from the apache
process itself.
--
Dan Nelson
dnelson at allantgroup.com
_______________________________________________
freebsd-questions at freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscribe at freebsd.org"
More information about the freebsd-questions
mailing list