FreeBSD 6.x CVSUP today crashes with zero load ...

Dmitry Pryanishnikov dmitry at atlantis.dp.ua
Tue Jun 27 06:41:26 UTC 2006


On Tue, 27 Jun 2006, M.Hirsch wrote:
> Yes, the result may be correct.

  If you're talking about single-bit error, you aren't quite correct. It isn't
"may be correct", it's _definitely_ correct (in mathematical sense; that it,
correcting code proves that we have one and only one error in bit number
N, hardware just inverts this bit, and result _is_ OK).

> 'Do not take "ECC" for "equals additional security"'

  Not security. ECC adds reliability.

> But, in FreeBSD, the function is a result of hardware-level correction. 
> Something that only kicks in in _real_ _serious_ situations.
> I just would like you (not specifically you, Dmitry) to aknowledge that 
> broken RAM is worth a "panic" in "standard situations"- if I may call it like 
> that.

   The predominant RAM errors are exactly the single-bit ones. Moreover, 
usually they _don't_ reappear again at the same cell. They (for example) may 
be caused by the spontaneous alpha-radioactivity (brought into the your 
computer by the usual dust) and as such don't indicate that RAM module must 
be replaced. They just break your data in unpredictable way, not your 
hardware. They (single-bit errors) are the main reason why ECC-capable memory 
and chipset must be used in the computer which calculates/transfers actually 
valuable data.

> If the RAM is broken for some bits, chances are great that there are more 
> following soon.

  If multiple-bit error happens, then yes, it can be the sign of actual 
hardware fault. And yes, ECC logic will report this event instantly.

Sincerely, Dmitry
-- 
Atlantis ISP, System Administrator
e-mail:  dmitry at atlantis.dp.ua
nic-hdl: LYNX-RIPE


More information about the freebsd-stable mailing list