"Load Balancing": How Busy are the servers?
Marc G. Fournier
scrappy at hub.org
Sun Jan 1 21:40:41 PST 2006
I just installed cacti, which seems fairly useful for 'long term views' of
how a server is doing ... now I have to figure out what SNMP MIBs related
to all of the "important things" :(
On Sun, 1 Jan 2006, Francisco Reyes wrote:
> Marc G. Fournier writes:
>
>> For all the technology, I was kinda hoping for some 'scientific formula' :)
>
> There are..
>
>> Now, I really hate to ask, but how do you use vmstat to get a feel for how
>> busy the disk subsystem is?
>
> For me, reading "Absolute BSD" by Michael Lucas was very helpfull.
> In particular Chapter 18, System performance.
>
> The three columns I look at are for vmstat "r" and "b" on the left, and
> "fault".
>
> "r" shows how many processes are waiting for CPU, "b" shows how many
> processes are waiting for disk. The fault column(s) show how badly your
> system is accesing swap.
>
> Quick example:
> r b w
> 2 5 0
> 1 5 0
> 2 4 0
> 2 5 0
> 3 4 0
> 1 5 0
> 1 5 0
>
>
> That's from my home machine as I am doing some backups.
> The machine at this point is more disk bound than CPU bound with 4 to 5 disk
> operations at any point in time waiting for disk access
>
> I am also falling behind in CPU, but not as bad.
>
> On the far right of vmsat you also have CPU stats.. in my case the vmstat
> from the above lines showed 70% to 90% iddle which confirmed I was disk bound
> at that point.
> The fault column show you how actively you are using swap. The lines above
> had between 30 and 200 approximately. If you look at swapinfo and you have a
> large amount of swap in use and then you see a high number in vmstat for
> fault, the machine is short on RAM for the load you have on it.
>
> So far in my experience nothing hurts a machine as badly as hitting swap
> (given that you have adequate CPU/disks). Once you start to hit swap heavily
> you need to do something (if you can...) such as moving services to another
> machine or putting in more memory.
>
> Instead of looking for fixed number I think that relative figures are more
> important.. like looking at your machines at their lowest usage and then at
> their busiest.. or at spikes.. If at slow times of activity the machines are
> already falling behind on "b", "r" on vmstat.. then that machine is
> overloaded.
>
> One possible quick way to start benchmarking your machines, until you can do
> something better is to capture snapshots of vmstat every 15 to 30 minutes and
> take a look.. perhaps even write a short script to summarize it. On my list
> of things to do.. is to do a simple setup of that nature.. just because it
> would be easy to setup and can provide very valuable information until you
> setup something more feature rich.
>
> "top" in 5.X branch and up is also very userfull. If you hit "m" it shows you
> disk processes so you can see what programs are doing the most I/O.
>
> One thing to watch out for in top when using 'm' is if you see all low
> numbers ( hit 'o' to sort and then type 'total').. is that you may have lots
> of programs doing little I/O, but their combined load is a problem for your
> disk subsystem.... like having 200+ IMAP connections. Each single IMAP
> connection may not be doing more than a handfull of transactions per second,
> but all of them combined can give a disk subsystem a pretty good workout.
>
> The load averages from 'w' are also good figures to do comparative tests. I
> started to wokr on a script (but needs more work) that dumps 'w' and 'vmstat'
> .. next have to work on parsing them and giving summaries. In particular one
> wants to know peak times.. since that is the best time to determine if the
> machine can handle it's load.. and more importantly spikes. If a machine is
> usually under 2.. and it spikes at 5+.. that machine is possibly able to do
> "normal" loads, but may not be able to handle spikes in traffic (ie a
> customer doing a mailing list, or a site just got press.. and there are a
> larger number than usual of people going to their URL).
>
> I still thinkg I have MUCH, MUCH to learn.. but I would be glad to expand on
> anything mentioned above.. or anything else. Ultimately each machine/company
> is unique enough that absolute numbers from other people (ie what is a good
> value for 'r' and 'b' to be around most of the time) may be less important
> than learning what are the different figures for your different machines
> under "normal" operation.
>
>
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy at hub.org Yahoo!: yscrappy ICQ: 7615664
More information about the freebsd-questions
mailing list