"Load Balancing": How Busy are the servers?
lists at stringsutils.com
Sun Jan 1 12:10:44 PST 2006
Marc G. Fournier writes:
> For all the technology, I was kinda hoping for some 'scientific formula'
> Now, I really hate to ask, but how do you use vmstat to get a feel for how
> busy the disk subsystem is?
For me, reading "Absolute BSD" by Michael Lucas was very helpfull.
In particular Chapter 18, System performance.
The three columns I look at are for vmstat "r" and "b" on the left, and
"r" shows how many processes are waiting for CPU, "b" shows how many
processes are waiting for disk. The fault column(s) show how badly your
system is accesing swap.
r b w
2 5 0
1 5 0
2 4 0
2 5 0
3 4 0
1 5 0
1 5 0
That's from my home machine as I am doing some backups.
The machine at this point is more disk bound than CPU bound with 4 to 5 disk
operations at any point in time waiting for disk access
I am also falling behind in CPU, but not as bad.
On the far right of vmsat you also have CPU stats.. in my case the vmstat
from the above lines showed 70% to 90% iddle which confirmed I was disk
bound at that point.
The fault column show you how actively you are using swap. The lines
above had between 30 and 200 approximately. If you look at swapinfo and you
have a large amount of swap in use and then you see a high number in vmstat
for fault, the machine is short on RAM for the load you have on it.
So far in my experience nothing hurts a machine as badly as hitting swap
(given that you have adequate CPU/disks). Once you start to hit swap heavily
you need to do something (if you can...) such as moving services to another
machine or putting in more memory.
Instead of looking for fixed number I think that relative figures are more
important.. like looking at your machines at their lowest usage and then at
their busiest.. or at spikes.. If at slow times of activity the machines are
already falling behind on "b", "r" on vmstat.. then that machine is
One possible quick way to start benchmarking your machines, until you can do
something better is to capture snapshots of vmstat every 15 to 30 minutes
and take a look.. perhaps even write a short script to summarize it. On my
list of things to do.. is to do a simple setup of that nature.. just because
it would be easy to setup and can provide very valuable information until
you setup something more feature rich.
"top" in 5.X branch and up is also very userfull. If you hit "m" it shows
you disk processes so you can see what programs are doing the most I/O.
One thing to watch out for in top when using 'm' is if you see all low
numbers ( hit 'o' to sort and then type 'total').. is that you may have lots
of programs doing little I/O, but their combined load is a problem for your
disk subsystem.... like having 200+ IMAP connections. Each single IMAP
connection may not be doing more than a handfull of transactions per second,
but all of them combined can give a disk subsystem a pretty good workout.
The load averages from 'w' are also good figures to do comparative tests. I
started to wokr on a script (but needs more work) that dumps 'w' and
'vmstat' .. next have to work on parsing them and giving summaries. In
particular one wants to know peak times.. since that is the best time to
determine if the machine can handle it's load.. and more importantly spikes.
If a machine is usually under 2.. and it spikes at 5+.. that machine is
possibly able to do "normal" loads, but may not be able to handle spikes in
traffic (ie a customer doing a mailing list, or a site just got press.. and
there are a larger number than usual of people going to their URL).
I still thinkg I have MUCH, MUCH to learn.. but I would be glad to expand on
anything mentioned above.. or anything else. Ultimately each machine/company
is unique enough that absolute numbers from other people (ie what is a good
value for 'r' and 'b' to be around most of the time) may be less important
than learning what are the different figures for your different machines
under "normal" operation.
More information about the freebsd-questions