What is loading my server so much?

Laszlo Nagy gandalf at shopzeus.com
Thu Dec 9 11:49:39 UTC 2010


System is FreeBSD shopzeus.com 8.1-STABLE FreeBSD 8.1-STABLE #0: Sun Oct 
31 02:55:28 EDT 2010     amd64
It has two quad-core Xeon CPUs, 24GB memory, and a RAID 1+0 array with 
10 disks + Areca 1680 controller with 2GB write back cache.

Server is running: mailscanner + apache multihost + PHP + postgresql. 
Main load on the server is usually postgresql.

Today something happened. Number of http processes went up to 200. As a 
result, number of connections to database also went up to 200, and the 
web server is now refusing clients with "Cannot connect to database" 
messages (coming from PHP).

This is a typical output from top:

last pid: 12789;  load averages:  7.77, 10.77, 
13.46                                                                                
up 26+03:00:30  06:22:04
6637 processes: 7 running, 623 sleeping, 7 zombie
CPU: 32.9% user,  0.0% nice,  7.6% system,  0.6% interrupt, 58.9% idle
Mem: 3885M Active, 15G Inact, 3236M Wired, 627M Cache, 2465M Buf, 656M Free
Swap: 12G Total, 12M Used, 12G Free

   PID USERNAME       THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU 
COMMAND
66834 pgsql            1 118    0   443M   417M CPU2    2  16:17 99.46% 
postgres
11473 pgsql            1  72    0   441M   242M sbwait  5   0:02  4.59% 
postgres
11026 pgsql            1  47    0   439M   249M sbwait  7   0:01  3.17% 
postgres
  6642 www              1  48    0   236M 42928K select  0   0:01  2.29% 
httpd
10147 www              1  48    0   236M 44048K select  6   0:01  2.10% 
httpd
  3961 shopzeus        29  44    0   208M 96364K uwait   4  18.4H  1.37% 
python


Here is what I don't understand. "last pid" is increasing relatively 
slowly, e.g. there are no hidden processes. Only the first one or two 
processes are showing CPU load > 10%.  The "CPU User%" value is about 
50%. We have lots of free memory. I/O load is almost nothing (see iostat 
below).

However, server load is between 7 and 13! In fact sometimes it is above 
16. And everybody complains that the server is too slow.

How can I find out what is causing the problem?

Example gstat output:

dT: 1.006s  w: 1.000s
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
     0      0      0      0    0.0      0      0    0.0    0.0| ad4
     0      0      0      0    0.0      0      0    0.0    0.0| ad4s1
     0      0      0      0    0.0      0      0    0.0    0.0| ad4s1d
     0      0      0      0    0.0      0      0    0.0    0.0| da0
     0      0      0      0    0.0      0      0    0.0    0.0| da0s1
     1    304      3     34   14.0    301   7522    0.2    5.1| da1
     0      2      2     32   11.9      0      0    0.0    2.4| da2
     0      0      0      0    0.0      0      0    0.0    0.0| da3
     0      0      0      0    0.0      0      0    0.0    0.0| da4
     0      0      0      0    0.0      0      0    0.0    0.0| da0s1a
     0      0      0      0    0.0      0      0    0.0    0.0| da0s1b
     0      0      0      0    0.0      0      0    0.0    0.0| da0s1d
     0      0      0      0    0.0      0      0    0.0    0.0| da0s1e
     1    304      3     34   14.0    301   7522    0.3    5.3| da1s1
     0      2      2     32   11.9      0      0    0.0    2.4| da2s1
     0      0      0      0    0.0      0      0    0.0    0.0| da3s1
     0      0      0      0    0.0      0      0    0.0    0.0| da4s1
     1    304      3     34   14.0    301   7522    0.4    5.4| da1s1d
     0      2      2     32   11.9      0      0    0.0    2.4| da2s1d
     0      0      0      0    0.0      0      0    0.0    0.0| da3s1d

Example iostat output:

        tty             ad4              da0              
da1             cpu
  tin  tout  KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s  us ni sy 
in id
    0   349 30.81  16  0.49  16.51  11  0.18  22.56 124  2.72  29  0  9  
1 61
    0  9282  0.00   0  0.00   0.00   0  0.00  16.00   7  0.11  41  0 11  
1 47
    0 12520  0.00   0  0.00   0.00   0  0.00  18.00   8  0.14  45  0 14  
0 41
    0 12205  0.00   0  0.00   0.00   0  0.00   0.00   0  0.00  38  0 15  
0 47

Example systat output:

                     /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
      Load Average >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

                     /0%  /10  /20  /30  /40  /50  /60  /70  /80  /90  /100
pgsql      postgres XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXX
www           httpd XXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXX
www           httpd XXXXXX
pgsql      postgres XXX
pgsql      postgres X
www           httpd X
root           intr X
www           httpd X
www           httpd X
www           httpd X
www           httpd X
shopzeus     python X
www           httpd X
www           httpd X
www           httpd X
www           httpd X
www           httpd X
www           httpd X
www           httpd X
www           httpd X
zeusd1       python X
www           httpd X
www           httpd X
www           httpd X
www           httpd X
www           httpd X
www           httpd X
www           httpd X


Looks like the server is almost idle. So how can I have load = 12 and 
similar values?

Thanks,

    Laszlo



More information about the freebsd-questions mailing list