FreeBSD 6.x CVSUP today crashes with zero load ...
M.Hirsch
webmaster at hirsch.it
Mon Jun 26 23:07:39 UTC 2006
Dmitry Pryanishnikov schrieb:
> When you wrote "ECC is a way to mask broken hardware", you were plain
> wrong.
> If you're using hardware w/o ECC, it just can't tell whether error
> present
> or absent. So ECC _is_ the way to detect (not mask) broken hardware.
>
Ok, thanks. I think I understand the meaning of ECC now.
So, unlike my supplier claims, ECC is not supposed to help against
hardware failures.
But it is the way to detect them, right?
> If you want ECC corrector to raise NMI on corrected error (as well as
> uncorrectable), just set approproate bit in control register - every
> Intel's ECC-capable chipset allows it. But if we're speaking about
> production environment, such behaviour (abnormal termination on
> _corrected_
> error) is unacceptable.
"abnormal termination" is not only acceptable for me, it is what I am
looking for.
Make the node crash completely, so one of the others can take over its
task(s).
> Don't get me wrong, but tracking bugs in FreeBSD is quite more of an
> effort than "just" akquiring a new box...
>
> I don't see connection between this sentence and ECC (which is
> hardware option).
What I wanted to say:
Looking for errors in the logs is only a few seconds.
Finding out what caused them, is hours...
Akquiring a new box is only $29,95 ;) - that's like 30 minutes, if you
regard it from the business side. ... I rather rent 100 boxes to do the
task of ten, than employ 100 admins to find the "real" problem.
Thanks, Dmitry. I think I know what to look for now...
M.
More information about the freebsd-stable
mailing list