massive load average spikes
markham_breitbach at ssimicro.com
Wed Aug 11 21:43:47 UTC 2010
> load average is a time averaged thing and in the case of a
> 'thundering herd' problem you will see the LA spike up and
> come down again over time.
> Do you see any problem as a result of this? Or is it just curiosity?
> you might want to use KTR or ktrace with scheduling events if you
> really want to see the reason for this. It could just be a sampling
> error when some 'tick' coincides with the sampling..
I have not seen any noticeable performance degradation when the LA spikes like this, and
the main nuisance of this was Sendmail's behaviour. I have since set the options
"RefuseLA=0" and "QueueLA=0" to avoid long stretches of SMTP being unavailable while the
load averaged itself out.
At this point it is really just a nagging feeling that something is misbehaving and it's
going to bite me when I least expect it (it always does!), so I would like to try and
track down the source of the problems, but I'm not even sure where to begin looking.
I have run some ktrace on sendmail and dovecot, but did not see anything that stood out,
although I don't really know if I would recognize the problem in a kdump anyway (Too much
information!) I'm not at all familiar with KTR, however. Is this something that can be
run on a production host or should it be isolated to a dev box? I have cloned the jail
into a dev environment on identical hardware, but only see the issue under production.
I'm not sure if this is a factor of insufficient load or just not enough random
strangeness outside of production.
Any suggestions for how KTR might help pin this down or what to look for?
More information about the freebsd-performance