kernel MCA messages

Andriy Gapon avg at icyb.net.ua
Tue Aug 24 20:03:33 UTC 2010


on 24/08/2010 22:51 Artem Belevich said the following:
> IMHO the key here is whether hardware is broken or not. The only case
> where correctable ECC errors are OK is when a bit gets flipped by a
> high-energy particle. That's a normal but fairly rare event. If you
> get bit flips often enough that you can recall details of more then
> one of them on the same hardware, my guess would be that you're
> dealing with something else -- bad/marginal memory, signal integrity
> issues, power issues, overheating... The list continues.. In all those
> cases hardware does *not* work correctly. Whether you can (or want to)
> keep running stuff on the hardware that is broken is another question.

Have you read the article? :)
If not, read at least the summary.

> On Tue, Aug 24, 2010 at 1:15 AM, Andriy Gapon <avg at icyb.net.ua> wrote:
>> on 24/08/2010 09:14 Ronald Klop said the following:
>>>
>>> A little off topic, but what is 'a low rate of corrected ECC errors'? At work
>>> one machine has them like ones per day, but runs ok. Is ones per day much?
>>
>> That's up to your judgment.  It's like after how many remapped sectors do you
>> replace HDD.
>> You may find this interesting:
>> http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf
>>
>> --
>> Andriy Gapon

-- 
Andriy Gapon


More information about the freebsd-stable mailing list