9.0 spontaneously reboots

Volodymyr Kostyrko c.kworr at gmail.com
Tue Mar 13 09:00:02 UTC 2012


Matthew Seaman wrote:
>> The only load I know to cause sure lockup in some hours is memcached.
>> Right now project is migrated to redis and machines survives for two
>> weeks. Most common problem for lockup is ECC error.
>
> I see.  That puts a different complexion on things.  Although it is
> application specific it doesn't rule out hardware problems.  In fact,
> given the nature of the error -- ECC problems -- it pretty much nails it
> as something wrong with the RAM in that machine.
>
> Given that memtest86 doesn't show any problems, and you can run a
> similar workload with different software it suggests that you have a
> memory stick (or sticks) that are marginal.  Something like extra heat
> due to higher rates of memory accesses from a particular application
> could be tipping it over the edge into failure.
>
> The 'marginal' behaviour need not be a fault in the memory stick per se.
>   It could simply be the particular characteristics of the memory you
> have installed not being exactly compatible with your motherboard.  In
> theory the memory conforming to a particular standard should avoid this
> sort of problem, but this is unfortunately not completely infallible.
> Swapping out memory sticks for an equivalent specification from a
> different manufacturer should give good results.

I already moved from Kingston to Hynix with no luck. Next guess points 
is motherboard problem (as memory is separated between processors) or 
processor problem. I'll gonna pop one processor out Leaving all memory 
on another one.

The only other weird thing about this server is:

dev.cpu.0.temperature: 37,0C
dev.cpu.1.temperature: 37,0C
dev.cpu.2.temperature: 35,0C
dev.cpu.3.temperature: 35,0C
dev.cpu.4.temperature: 43,0C
dev.cpu.5.temperature: 43,0C
dev.cpu.6.temperature: 38,0C
dev.cpu.7.temperature: 38,0C
dev.cpu.8.temperature: 38,0C
dev.cpu.9.temperature: 38,0C
dev.cpu.10.temperature: 37,0C
dev.cpu.11.temperature: 37,0C
dev.cpu.12.temperature: 33,0C
dev.cpu.13.temperature: 33,0C
dev.cpu.14.temperature: 34,0C
dev.cpu.15.temperature: 34,0C

And it's consistent - cores 4 and 5 always are hotter then any other. 
This can be something with scheduler, however this started before any 
actual load. Though numbers are normal I had never seen something alike...

-- 
Sphinx of black quartz judge my vow.


More information about the freebsd-questions mailing list