FreeBSD and ECC memory?

Erik Trulsson ertr1013 at student.uu.se
Fri Jul 25 13:00:58 UTC 2008


On Fri, Jul 25, 2008 at 08:42:54AM -0400, Michael Powell wrote:
> Nejc ?koberne wrote:
> 
> > Hello,
> > 
> > I am buying hardware for a FreeBSD server and me and my friend argue about
> > whether or not to by ECC RAM for the server. It is a HP ProLiant ML110 G4
> > machine and currently it has 2 x 512 HP DDR2 ECC memory.
> > 
> > My friend says buying ECC memory is not wise, because we would not profit
> > from it since this server will not need very high availability (but still
> > we'd like to make it a solid server). And also that ECC memory slows down
> > memory operations by 2-3% all together. Also, we would profit from buying
> > non-ECC memory because we already have 2 x 1GB non-ECC memory and if we:
> > 
> >   - buy extra 2 x 1GB non-ECC memory we'll have 4GB all together (4 x 1GB)
> >   - buy extra 2 x 1GB ECC memory we'll have 3GB all together (2 x 512MB +
> >   2 x 1GB)
> > 
> > 1. So, what would you base your decision on? Is getting ECC worth losing
> > 1GB of non-ECC memory?
> 
> My decision would be based upon what the server was going to be used for.
> Home use, or non "mission critical" I'd say non-ECC is just fine. At work
> for "mission critical" database, mail, etc I stick with ECC. Especially
> when it comes to Windows, as Windows has a nasty habit of trying to mask
> what's going on behind the scene. No way I'd run a large SQL database or
> Exchange server without ECC.
> 
> I'd be more concerned with trying to buy all the memory at the same time so
> the sticks were all identical, especially with regard to timing and speed
> ratings. You can create a problem when you have stick(s) from one
> manufacturer then add in different ones later. IMHO, in this particular
> situation, my "gut" feeling from your description would be to go with the
> 4GB of non-ECC as it sounds like the scenario doesn't match the criteria I
> use for justifying ECC as a "must have".
> 
> > 2. What are your experiences with ECC?
> > 3. Did self-halt because of a memory error (having ECC memory) ever happen
> > to someone here?
> 
> If it does you have defective hardware that is in need of replacement. Yes,
> I have had bad RAM; whether it's ECC or non-ECC isn't the issue when it is
> simply defective.
> 
> > 4. If there is non-ECC memory installed, how does FreeBSD recognizes
> > (corrects?) memory errors?
> > 
> 
> Generally speaking this occurs more at the hardware level. Non-ECC RAM can
> correct single bit errors while ECC is capable of fixing multi-bit errors.

No, non-ECC RAM cannot detect or correct any errors at all. (Old parity-RAM
could detect, but not correct, single-bit errors.)
ECC is generally capable of detecting multi-bit errors and fixing single-bit
errors. (There are different ways of implementing ECC. Some of them might
well be able to fix multi-bit errors too.)

> However, should I become aware that ECC was "fixing" too many errors too
> often I would consider there to be defective hardware present.
> 
> The purpose of these schemes is to compensate for the fact that in every so
> many (some large number) of memory transactions there may be a bit that
> gets flipped. If this is happening more often than (some large number) then
> there is a defect present. ECC just buys you "uptime" in the event there
> are more errors than there should be. 

Note that random, spontaneous bit flips can happen (infrequently) even in
perfectly good RAM. (Due to cosmic rays, radioactive decay in surrounding
material, and similar stuff. (No, I am not joking.))  ECC will handle
such errors just fine, and that is the main reason why I would want ECC.

You can also get defective memory modules, but such can usually be detected
by running memtest86 or similar.  ECC can usually handle memory modules that
have some bits more or less permanently wrong, but such modules should be
replaced as soon as possible.

> 
> In either case these bit flips should only happen extremely infrequently, if
> ever at all. Consider that these schemes are sort of a fallback to an
> extreme "what if" situation and really shouldn't come into play during most
> nominal operations. I would go with ECC for something that just had to
> stay "up" even in the face or errors. In either case I'd still replace the
> defective component(s), irregardless of whether they were ECC or not. I've
> seen thousands of machines with non-ECC RAM over the last 15 years that
> worked just fine.




-- 
<Insert your favourite quote here.>
Erik Trulsson
ertr1013 at student.uu.se


More information about the freebsd-questions mailing list