bad RAM? prove it with a crash dump?

Nate Eldredge nate at thatsmathematics.com
Thu May 6 15:26:23 UTC 2010


On Thu, 6 May 2010, Andrew Duane wrote:

> It is also useful to make sure that the garbage itself is different. As 
> mentioned before, a single bit error in an otherwise valid value, or 
> maybe a missing/scrambled byte, these are good indications of memory 
> problems. If random places are often overwritten with something else, 
> that could just be another piece of misbehaving code that is writing 
> someplace it shouldn't. I've often found code that writes some buffer 
> into e.g. a piece of memory it no longer owns that looks like memory 
> corruption until you realize the garbage is always something specific 
> like a vnode structure.

There are trickier things too.  I once had a machine with bad cache memory 
where once in a while you would get a cache line that had come from 
somewhere else in memory.  This was particularly vexing when it happened 
to an I/O buffer, and I wound up with a large zip file that had 32 bytes 
of libc.so somewhere in the middle... :-(

And of course, swapping out the RAM wouldn't have fixed it.

-- 

Nate Eldredge
nate at thatsmathematics.com


More information about the freebsd-hackers mailing list