Big Problem, Load Avg Very High

Steven Adams steve at drifthost.com
Fri Nov 5 22:23:36 PST 2004


Hi Dan,

I disabled one of my clients web pages ( the one that gets the most hits
)and its gone down to 130MB swap used and staying there. The clients site is
just a small site with one page that shows one gallery but gets a lot of
hits...

It now has 140MB free.. but It seems as if it slowly drops then comes back
then drops again.. 

The weird thing is the clients site has been up for months and is getting
the same hits, the load wasn't this bad before but was still around 1-2..

I used to run all the same thing on a different server with 1 cpu and 512MB
ram on slackware linux and it ran fine..

That's why its confussing me.. ill give systat a go.

I was going to do a make buildworld and update to freebsd 5.3 and see if
that fixes it..

But Im not to sure if I should, ive been told it getting released soon so
maybe its ok to update..?

Heres a ps aux output

============================================
USER       PID %CPU %MEM   VSZ  RSS  TT  STAT STARTED      TIME COMMAND
root        11 98.1  0.0     0   12  ??  RL   Fri01PM 1627:28.24  (idle:
cpu3)
root        12 97.3  0.0     0   12  ??  RL   Fri01PM 1614:05.04  (idle:
cpu2)
root        13 95.2  0.0     0   12  ??  RL   Fri01PM 1571:36.41  (idle:
cpu1)
root        14 89.6  0.0     0   12  ??  RL   Fri01PM 1488:33.38  (idle:
cpu0)
mysql    56065  1.3  3.1 74012 32480  ??  S     4:43PM   0:24.23
/usr/local/libexec/mysqld --basedir=/usr/local --datadir=/va
nobody   60870  1.1  1.3 18644 13592  ??  S     5:07PM   0:04.83
/usr/local/apache/bin/httpd -DSSL
jeneaux  62637  1.1  0.3  6192 2636  ??  S     5:20PM   0:00.03 cppop -
serving 203.122.211.211 - TRANSACTION - jeneaux (cpp
root        29  0.0  0.0     0   12  ??  WL   Fri01PM   5:19.11  (irq16:
bge0)
root        37  0.0  0.0     0   12  ??  WL   Fri01PM   1:20.92  (irq24:
amr0)
root        62  0.0  0.0     0   12  ??  WL   Fri01PM   5:23.31  (swi1: net)
root        63  0.0  0.0     0   12  ??  WL   Fri01PM   3:49.56  (swi8:
clock)
root         2  0.0  0.0     0   12  ??  DL   Fri01PM   0:08.92  (g_event)
root         3  0.0  0.0     0   12  ??  DL   Fri01PM   2:38.58  (g_up)
root         4  0.0  0.0     0   12  ??  DL   Fri01PM   3:33.41  (g_down)
root        65  0.0  0.0     0   12  ??  DL   Fri01PM   0:48.94  (random)
root        66  0.0  0.0     0   12  ??  WL   Fri01PM   1:04.29  (swi6:+)
root         5  0.0  0.0     0   12  ??  DL   Fri01PM   0:00.00  (taskqueue)
root        68  0.0  0.0     0   12  ??  WL   Fri01PM   0:00.00  (swi7:
acpitaskq)
root        70  0.0  0.0     0   12  ??  WL   Fri01PM   0:00.00  (swi3:
cambio)
root        71  0.0  0.0     0   12  ??  WL   Fri01PM   0:00.00  (swi7: task
queue)
root         6  0.0  0.0     0   12  ??  IL   Fri01PM   0:00.00
(acpi_task0)
root         7  0.0  0.0     0   12  ??  IL   Fri01PM   0:00.00
(acpi_task1)
root         8  0.0  0.0     0   12  ??  IL   Fri01PM   0:00.00
(acpi_task2)
root         9  0.0  0.0     0   12  ??  DL   Fri01PM  45:03.28
(pagedaemon)
root        72  0.0  0.0     0   12  ??  DL   Fri01PM  11:07.16  (vmdaemon)
root        73  0.0  0.0     0   12  ??  DL   Fri01PM   4:20.71  (pagezero)
root        74  0.0  0.0     0   12  ??  DL   Fri01PM   0:06.06  (bufdaemon)
root        75  0.0  0.0     0   12  ??  DL   Fri01PM   1:40.70  (syncer)
root        76  0.0  0.0     0   12  ??  DL   Fri01PM   0:01.80  (vnlru)
root       571  0.0  0.1  4348 1172 con- S    Fri01PM   0:06.88 rrdtimer
(perl)
root       573  0.0  0.0   976  332  ??  Ss   Fri01PM   0:16.94
/usr/sbin/MegaServ MegaCtrl
root       621  0.0  0.1  4892 1260 con- S    Fri01PM   0:17.42 perl
./read-data.pl start part
root       636  0.0  0.3  6812 2692 con- S    Fri01PM   1:31.68 perl
./read-data.pl start system
root       650  0.0  0.1  4868 1236 con- S    Fri01PM   0:09.02 perl
./read-data.pl start traffic
root       668  0.0  0.0  1372  204  ??  Ss   Fri01PM   0:00.72
/usr/sbin/cron
bind       691  0.0  0.8 13396 7848  ??  Ss   Fri01PM   0:53.57
/usr/sbin/named -u bind -c /etc/named.conf
mailnull   707  0.0  0.0  5352  360  ??  Is   Fri01PM   0:00.76
/usr/sbin/sendmail -bd -q30m (exim-4.42-0)
mailnull   711  0.0  0.0  5352   12  ??  Is   Fri01PM   0:00.00
/usr/sbin/sendmail -tls-on-connect -bd -oX 465 (exim-4.42-0)
root       713  0.0  0.1  2412 1084 con- S    Fri01PM   0:19.81 antirelayd
(perl)
root       727  0.0  0.0 19680   12  ??  Is   Fri01PM   0:00.48
/usr/bin/spamd -d --allowed-ips=127.0.0.1 --pidfile=/var/run
root       734  0.0  1.5 23740 15168  ??  I    Fri01PM   0:36.61 spamd child
(perl)
root       735  0.0  1.5 24080 15512  ??  I    Fri01PM   0:36.18 spamd child
(perl)
root       736  0.0  1.5 23972 15416  ??  I    Fri01PM   1:24.11 spamd child
(perl)
root       737  0.0  1.5 23776 15824  ??  I    Fri01PM   0:58.46 spamd child
(perl)
root       738  0.0  1.4 25260 14804  ??  I    Fri01PM   0:35.16 spamd child
(perl)
nobody     774  0.0  0.0  2352  240  ??  Ss   Fri01PM   0:01.16 proftpd:
(accepting connections) (proftpd)
root       824  0.0  0.7 12916 7012 con- IN   Fri01PM   0:54.40 cpanellogd -
sleeping for logs (perl)
root       837  0.0  0.1  6120 1280 con- S    Fri01PM   0:04.44 cppop -
accepting on port 110 (cppop)
nobody     915  0.0  0.0  1164   12 con- I    Fri01PM   0:00.00
/usr/local/cpanel/bin/startmelange (melange)
nobody     917  0.0  0.0  2820   12 con- I    Fri01PM   0:00.00 entropychat
(perl)
root      1009  0.0  0.0  1276   12  v2  Is+  Fri01PM   0:00.00
/usr/libexec/getty Pc ttyv2
root      1010  0.0  0.0  1276   12  v3  Is+  Fri01PM   0:00.00
/usr/libexec/getty Pc ttyv3
root      1011  0.0  0.0  1276   12  v4  Is+  Fri01PM   0:00.00
/usr/libexec/getty Pc ttyv4
root      1012  0.0  0.0  1276   12  v5  Is+  Fri01PM   0:00.00
/usr/libexec/getty Pc ttyv5
root      1013  0.0  0.0  1276   12  v6  Is+  Fri01PM   0:00.00
/usr/libexec/getty Pc ttyv6
root      1014  0.0  0.0  1276   12  v7  Is+  Fri01PM   0:00.00
/usr/libexec/getty Pc ttyv7
root      1253  0.0  0.0  1416   12  ??  Is   Fri01PM   0:00.12
/usr/sbin/inetd -wW
root      2775  0.0  0.0  3492   12  ??  Is   Fri01PM   0:00.09
/usr/sbin/sshd
drift    20013  0.0  0.0  2060  504  ??  Is   Fri03PM   0:01.07 imapd
root     85347  0.0  0.0 10808   12  ??  Is   10:40AM   0:00.00
/usr/sbin/clamd
root     85388  0.0  0.1  2312 1008  ??  S    10:40AM   0:02.86 antirelayd
(perl)
cpanel   10149  0.0  0.0  3020   12  ??  Is   10:56AM   0:00.00
/usr/bin/stunnel-4.04local /usr/local/cpanel/etc/stunnel/def
root     10168  0.0  0.1  7464 1204  ??  I    10:56AM   0:00.49 cpsrvd -
waiting for connections (cpsrvd)
mailman  18434  0.0  0.0  8036   12  ??  Is   12:03PM   0:00.02
/usr/bin/python2 /usr/local/cpanel/3rdparty/mailman/bin/mail
mailman  18435  0.0  0.1  7996 1280  ??  S    12:03PM   0:05.33
/usr/bin/python2 /usr/local/cpanel/3rdparty/mailman/bin/qrun
mailman  18436  0.0  0.1  8000 1304  ??  S    12:03PM   0:05.65
/usr/bin/python2 /usr/local/cpanel/3rdparty/mailman/bin/qrun
mailman  18437  0.0  0.1  8000 1252  ??  S    12:03PM   0:05.47
/usr/bin/python2 /usr/local/cpanel/3rdparty/mailman/bin/qrun
mailman  18438  0.0  0.1  7996 1276  ??  S    12:03PM   0:05.36
/usr/bin/python2 /usr/local/cpanel/3rdparty/mailman/bin/qrun
mailman  18439  0.0  0.1  8008 1272  ??  S    12:03PM   0:05.44
/usr/bin/python2 /usr/local/cpanel/3rdparty/mailman/bin/qrun
mailman  18440  0.0  0.1  8040 1300  ??  S    12:03PM   0:05.84
/usr/bin/python2 /usr/local/cpanel/3rdparty/mailman/bin/qrun
mailman  18441  0.0  0.1  7996 1268  ??  S    12:03PM   0:05.39
/usr/bin/python2 /usr/local/cpanel/3rdparty/mailman/bin/qrun
mailman  18442  0.0  0.1  7996  872  ??  I    12:03PM   0:00.43
/usr/bin/python2 /usr/local/cpanel/3rdparty/mailman/bin/qrun
root     18450  0.0  0.0 15364   12  ??  Is   12:03PM   0:00.01
/usr/local/cpanel/whostmgr/bin/whostmgr2 ./dotweaksettings
root     18490  0.0  0.6  8772 6596  ??  IN   12:04PM   0:03.72 cpanellogd -
sleeping for logs (perl)
root     18492  0.0  0.5 11644 5580  ??  IN   12:04PM   0:05.24 cpanellogd -
sleeping for logs (perl)
root     18501  0.0  0.1  6124  580  ??  S    12:04PM   0:00.25 cppop -
accepting on port 110 (cppop)
root     18503  0.0  0.1  6124  580  ??  S    12:04PM   0:00.25 cppop -
accepting on port 110 (cppop)
root     18504  0.0  0.1  7480  568  ??  S    12:04PM   0:00.98 cpsrvd -
waiting for connections (cpsrvd)
root     27335  0.0  0.0  1316  296  ??  Ss    1:08PM   0:02.07 syslogd
root     44669  0.0  0.0  1632   12  v1  Is    3:32PM   0:00.03 login [pam]
(login)
root     44670  0.0  0.0  1276   12  v0  Is+   3:32PM   0:00.00
/usr/libexec/getty Pc ttyv0
root     45124  0.0  0.0  1900   12  v1  I+    3:36PM   0:00.05 -bash (bash)
root     49672  0.0  0.0  6228   12  ??  Is    4:07PM   0:00.07 sshd: steve
[priv] (sshd)
steve    49675  0.0  0.1  6268  656  ??  I     4:07PM   0:00.72 sshd:
steve at ttyp0 (sshd)
steve    49676  0.0  0.0  1900   12  p0  Is    4:07PM   0:00.02 -bash (bash)
root     49697  0.0  0.0  1644   12  p0  I     4:07PM   0:00.02 su -
root     49701  0.0  0.1  1900  540  p0  I+    4:07PM   0:00.05 -su (bash)
root     51414  0.0  0.0  6228   12  ??  Is    4:19PM   0:00.08 sshd: steve
[priv] (sshd)
steve    51462  0.0  0.1  6268  688  ??  S     4:19PM   0:01.20 sshd:
steve at ttyp1 (sshd)
steve    51467  0.0  0.0  1900   12  p1  Is    4:19PM   0:00.02 -bash (bash)
root     52125  0.0  0.0  1644   12  p1  I     4:27PM   0:00.01 su -
root     52145  0.0  0.1  1916 1168  p1  S     4:28PM   0:00.32 -su (bash)
root     89410  0.0  0.2  2248 1592  ??  I     4:51PM   0:00.01
postsuexecinstall - searching for suexec problems (1572 min 
root     56341  0.0  0.2  2248 1592  ??  S     4:59PM   0:00.01
postsuexecinstall - searching for suexec problems (1579 min 
root     60838  0.0  0.8 13468 8112  ??  Ss    5:07PM   0:00.51
/usr/local/apache/bin/httpd -DSSL
nobody   60844  0.0  1.4 19224 14164  ??  S     5:07PM   0:04.34
/usr/local/apache/bin/httpd -DSSL
nobody   60845  0.0  1.2 18044 13008  ??  S     5:07PM   0:03.09
/usr/local/apache/bin/httpd -DSSL
nobody   60846  0.7  1.2 18024 12980  ??  S     5:07PM   0:01.82
/usr/local/apache/bin/httpd -DSSL
nobody   60847  0.0  1.2 18092 13040  ??  I     5:07PM   0:01.82
/usr/local/apache/bin/httpd -DSSL
nobody   60848  0.0  1.2 17984 12932  ??  I     5:07PM   0:02.43
/usr/local/apache/bin/httpd -DSSL
nobody   60849  0.0  1.4 19596 14584  ??  I     5:07PM   0:03.44
/usr/local/apache/bin/httpd -DSSL
nobody   60865  0.7  1.2 18024 12984  ??  S     5:07PM   0:03.29
/usr/local/apache/bin/httpd -DSSL
nobody   60866  0.0  1.2 18008 12992  ??  S     5:07PM   0:04.36
/usr/local/apache/bin/httpd -DSSL
nobody   60868  0.6  1.2 18004 12968  ??  S     5:07PM   0:04.17
/usr/local/apache/bin/httpd -DSSL
nobody   60869  0.7  1.3 18456 13444  ??  S     5:07PM   0:04.17
/usr/local/apache/bin/httpd -DSSL
nobody   60871  0.0  1.2 18020 12976  ??  I     5:07PM   0:02.60
/usr/local/apache/bin/httpd -DSSL
nobody   61115  0.4  1.2 17976 12932  ??  S     5:09PM   0:01.48
/usr/local/apache/bin/httpd -DSSL
nobody   61518  0.1  1.2 17512 12500  ??  S     5:12PM   0:00.72
/usr/local/apache/bin/httpd -DSSL
nobody   61957  0.0  1.2 18036 12980  ??  S     5:15PM   0:01.20
/usr/local/apache/bin/httpd -DSSL
nobody   61975  0.5  1.2 17652 12636  ??  S     5:15PM   0:01.56
/usr/local/apache/bin/httpd -DSSL
nobody   61976  0.0  1.2 17812 12800  ??  S     5:15PM   0:01.19
/usr/local/apache/bin/httpd -DSSL
nobody   62128  0.0  0.8 13516 8184  ??  I     5:17PM   0:00.01
/usr/local/apache/bin/httpd -DSSL
nobody   62350  0.2  1.4 19260 14208  ??  S     5:17PM   0:01.90
/usr/local/apache/bin/httpd -DSSL
nobody   62461  0.8  1.2 17916 12800  ??  S     5:18PM   0:00.66
/usr/local/apache/bin/httpd -DSSL
root         0  0.0  0.0     0    0  ??  ZW   -         0:00.00  (pstat)
root     62655  0.0  0.1  1432  808  p1  R+    5:20PM   0:00.00 ps auxf
root         0  0.0  0.0     0    4  ??  DLs  Fri01PM   0:01.83  (swapper)
root        10  0.0  0.0     0   12  ??  DL   Fri01PM   0:00.00  (ktrace)
root         1  0.0  0.0   760   84  ??  ILs  Fri01PM   0:00.86 /sbin/init
--
root        15  0.0  0.0     0   12  ??  WL   Fri01PM   0:00.06  (irq1:
atkbd0)
root        19  0.0  0.0     0   12  ??  WL   Fri01PM   0:00.00  (irq6:
fdc0)
root        28  0.0  0.0     0   12  ??  WL   Fri01PM   0:01.70  (irq15:
ata1)
================================================

Steven Adams steve at drifthost.com 
DriftNet Web Services http://www.drifthost.com 
Home: +61 2 94274857
Fax: +61 2 94274857
Mobile +61 (0) 404 085644

-----Original Message-----
From: Dan Nelson [mailto:dnelson at allantgroup.com] 
Sent: Saturday, 6 November 2004 5:08 PM
To: steve at drifthost.com
Cc: questions at freebsd.org
Subject: Re: Big Problem, Load Avg Very High

In the last episode (Nov 06), Steven Adams said:
> We host a couple or sites on this server (not very big sites). My
> server load is always around 0.90 - 3.40. Sometimes it will jump up
> to 10-15.
> 
> At random it will jump up to 30-40 load and I wont even be able to
> get to the server, typing commands on the remote ip based kvm is VERY
> slow sometimes missing letters. As soon as im able to get top running
> it shows
> 
> ====================================
> 
> last pid: 52614;  load averages:  6.82, 15.75, 15.18 up 1+03:07:12
16:32:22
> 
> 462 processes: 1 running, 460 sleeping, 1 zombie
> CPU states:  0.0% user,  0.0% nice,  0.6% system,  0.6% interrupt, 98.7%
idle
> 
> Mem: 615M Active, 68M Inact, 288M Wired, 29M Cache, 112M Buf, 1844K Free
> Swap: 1536M Total, 555M Used, 981M Free, 36% Inuse, 12K In

500MB of swap used?  You might have a process that's allocating too
much memory and causing the rest of the processes to swap to disk. Try
keeping a top session running all the time so you can monitor swap
usage and see if you notice any processes taking more memory than they
should.
 
> I did notice once when running systat -vmstat the amr0(scsi raid) jumps up
> to 99% busy copying 2-3MB/s for a few moments then goes back down..

That could be either regular disk activity or swap thrashing.  "vmstat
1" will tell you (watch the fre, pi and po columns).
 
> After one of the times it went to 50load it got this on the console
> screen. FYI: amrd0s1h is /home partition
> 
> Swap_pager: indefinite wait buffer: device: amrd0s1h, blkno: 103776, size:
32768
> Swap_pager: indefinite wait buffer: device: amrd0s1h, blkno: 130801, size:
4096

Most likely you're thrashing.  I've seen a couple other people mention
this error with 5.2.1, but not lately, so chances are 5.3 has fixed
this particular problem.

> It seems that its copying a lot of information to the swap drive and
> is running out of ram, which I don't know why it seems apache is
> taking up all of the ram for some weird reason?

Not weird at all.  If you are using perl or php modules, they can
really suck up ram if you get a lot of page hits at once.  You might
want to look at using fastcgi to separate perl/php from the apache
process itself.

-- 
	Dan Nelson
	dnelson at allantgroup.com
_______________________________________________
freebsd-questions at freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscribe at freebsd.org"






More information about the freebsd-questions mailing list