kernel MCA messages

Artem Belevich fbsdlist at src.cx
Tue Aug 24 19:51:54 UTC 2010


IMHO the key here is whether hardware is broken or not. The only case
where correctable ECC errors are OK is when a bit gets flipped by a
high-energy particle. That's a normal but fairly rare event. If you
get bit flips often enough that you can recall details of more then
one of them on the same hardware, my guess would be that you're
dealing with something else -- bad/marginal memory, signal integrity
issues, power issues, overheating... The list continues.. In all those
cases hardware does *not* work correctly. Whether you can (or want to)
keep running stuff on the hardware that is broken is another question.

--Artem



On Tue, Aug 24, 2010 at 1:15 AM, Andriy Gapon <avg at icyb.net.ua> wrote:
> on 24/08/2010 09:14 Ronald Klop said the following:
>>
>> A little off topic, but what is 'a low rate of corrected ECC errors'? At work
>> one machine has them like ones per day, but runs ok. Is ones per day much?
>
> That's up to your judgment.  It's like after how many remapped sectors do you
> replace HDD.
> You may find this interesting:
> http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf
>
> --
> Andriy Gapon
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
>


More information about the freebsd-stable mailing list