DS10L - "processor correctable error"

Bernd Walter ticso at cicely12.cicely.de
Thu Feb 7 14:06:26 PST 2008


On Thu, Feb 07, 2008 at 09:52:54AM +0000, Dieter wrote:
> > > > > "Warning: received processor correctable error."
> 
> > > > This is an ECC memory correction.
> 
> > > Can I know which DIMM (DS10L has 2 DIMMs) is faulty?
> 
> > > The message appears approx. once every other pass.
> > > The address is always the same.
> 
> If you decide to replace one, you could pick one at random and
> see if the error goes away.  Since your error is repeatable
> you can run the test to see if you guessed right.  Since there
> are only 2 DIMMs you have a 50% chance of guessing right the 1st
> time and 100% the 2nd time.
> 
> > Alphas are using the memory in pairs and can correct multiple faulty
> > bits in a single dataword.
> 
> Really?  I've always assumed that it was the standard single error
> correction double error detection.

It uses at least 128bit words - some alphas even uses 256 bit words.
so there are 16 or even 32  bits for ECC and this allows correcting
more than just a single bit.
(some?) Alphas use a multi stage correction mechanism.
The first stage is done in hardware and if the hardware fails to handle
it, it is done in software by a palcode handler.
Maybe even some non alpha systems do multi bit correction, since a
modern i386/amd64 has at least 8bit for ECC, but I only know it for
sure with alphas.

-- 
B.Walter                http://www.bwct.de      http://www.fizon.de
bernd at bwct.de           info at bwct.de            support at fizon.de


More information about the freebsd-alpha mailing list