tool to determine server stability issues
zszalbot at gmail.com
Wed Mar 4 01:53:38 PST 2009
I am not sure if it was upgrade to perl 5.8.9 which started my
problem, but anyway I am spotting a strange server behaviour. It will
usually last about 5 minutes during which the system becomes
unresponsive. Top tells me there are two perl processes run by user
www both of which use 100% of a CPU%. The server has four CPUs so
that's ok. What is strange, though is that during such a storm the
outgoing bandwidth is all taken up and this is the reason server
becomes unresponsive. Normally, it does happen that the bandwidth is
taken almost completely by remote backup job but I have priority
queueing with pf and it has never been a problem. A site will be
served fast even though the bandwidth is taken up, because httpd
traffic has higher priority. Also, in this particular case, backup job
is not involved (especially that the perl processes are run by user
www) so it must be something else.
I have looked through apache's logs but I cannot seem to find anything
strange (normal traffic without any type of DoS activity, etc.).
I have turned on debugging in HotSanic which I use for traffic/system
measurement but it would not generate outgoing traffic.
I guess I am looking for advice how to debug this. I often spot the
problem when it is about to end so I do not have enough time to start
some a more detailed monitoring (also I am not sure which tool would
be best to use). I'd appreciate any advice on how to troubleshoot and
find out the source of the problem.
Today, I have managed to run netstat during the outage (the ssh
session was on so I was able to continue, otherwise I wouldn't get to
the server). I can provide its output if it is of any use.
I have never had anything like this before so I am in the dark here. I
use FreeBSD 7.0-RELEASE-p9 #3.
Many thanks in advance!
More information about the freebsd-questions