Is this a sign of memory going bad?

Matt Emmerton matt at gsicomp.on.ca
Thu Nov 25 20:49:29 PST 2004


> On Thu, 25 Nov 2004, Haulmark, Chris wrote:
>
> > Someone broke the silence:
> >
> > > On Thu, Nov 25, 2004 at 04:05:53PM -0500, Lowell Gilbert wrote:
> > >> Jonathon McKitrick <jcm at FreeBSD-uk.eu.org> writes:
> > >>
> > >>> This is what I get from make buildworld.  I've gotten signal 10,
> > >>> 11, and now 5.
> > >>>
> > >>> Is this bad memory?
> > >>
> > >> That's a reasonable guess, but the only way to tell for sure is to
> > >> test it.
> > >
> > > Is there a port to do this, or do I have to take it out and take it
> > > somewhere else to get it tested?
> > >
> > > jm
> >
> > sysutils/memtest in the ports.
>
> I don't want to embarrass anyone here, but something needs to be said.
> Note this next sentence carefully: THERE IS NO SUCH THING AS A WORKING
> MEMORY TEST PROGRAM!!!
>
> Anyone who tells you otherwise is no friend of yours, because they are
> making your life hard.  It's very alluring to assume that programs written
> to do a job actually do that job, and most especially in the case of
> memory test, one would *really* **REALLY** wish that Chuck here was lying,
> cause you honestly need a memory test program, but the truth is otherwise:
> memory test programs don't work.  At the very best, if they spend 30
> minutes carefully exercising memory, you get a factor that is maybe 10%
> reliable, and 90% wishful guessing.
>
> With that in mind, sometimes, the very best memory test programs can give
> you better ideas that memory you thought was failing IS failing.  The
> opposite, proving that memory is good, is just totally, totally useless,
> you cannot take any data home at all about your memory being good.

And it's for this very reason that I often keep a few extra sticks of memory
lying around my office.  When a system starts acting wonky (intermittent
crashes, especially under load like during a buildworld or heavy
spamassassin/razor activity), I take it offline, swap memory, and see if the
bad behaviour continues.  If it does, I'm no worse off than before.  If it
doesn't, I have a pretty good confidence level in saying that the memory was
bad.

While this method may seem somewhat brute-force-ish, it's often much quicker
and easier than futzing around with memtest and guessing.
Given the cost of memory these days, swapping it out is generally cheaper
than the cost of random downtime and recovering from crashes in a production
environment.

--
Matt Emmerton




More information about the freebsd-questions mailing list