5.3 in diskless cluster: irregular reboots at 14:09 hr. ?!?!

Rob spamrefuse at yahoo.com
Wed Dec 29 02:06:47 PST 2004


Colin J. Raven wrote:
> On Dec 29, Rob launched this into the bitstream:
> 
>>
>> I'm running 5.3-Stable on all PC's.
>>
>> I have a master/router with 7 diskless slaves. One of the
>> slaves shows irregular reboots, without a trace, not even
>> a shutdown message in the logs.
>>
>> Until now I have the following sudden reboots of one particular
>> slave happen:
>>        Nov. 16 14:09:41
>>        Nov. 30 14:09:23
>>        Dec. 28 14:09:34
>>
>> Each is exactly at the same time; this is rather peculiar, isn't it?
>>
>> Any idea what's going on here, or how to trace this problem?
> 
> 
> What *else* is happening at (or immediately before) 14:09 on this 
> machine?? For example is something rather intense occurring immediately 
> beforehand? I'm thinking power supply failure when it get's loaded 
> beyond a certain point...so, pursuant to that is there maybe  a big log 
> grep happening beforehand, or some other event that stresses components, 
> thus consuming more power?

Thank you Colin.

What would be a good command to run, to find out how stressful the
PC is right before the reboot? Is 'top' good enough? Or is there
something better? 'ps auxw' for example?

Since I don't know on what date it happens a next time, I will start
a cron job each day at 14:08 to check how stressful the PC is. It will
output the result of the job to disk.

 > It has that funny; "I'll bet the PSU is on the way out" feeling to it,
 > but actually proving that can be tedious.

I may also swap UPS between two slaves and see if the reboots are
related to a shaky UPS. I don't want to replace the PSU yet :(.

Rob.


More information about the freebsd-questions mailing list