System, diagnose thyself: auto-documentation for crashes

Ivan Voras ivoras at freebsd.org
Sat Aug 30 11:27:18 UTC 2008


Kirk Strauser wrote:
> I was having flaky system problems that were driving me to distraction. 
> Yesterday, I finally got a panic message with an instruction pointer,
> used addr2line to see that the failure was in uma_zfree_internal,
> searched Google, and learned that it was probably due to bad RAM.  Half
> any hour later, memtest86 found the defective stick and the problem was
> solved.
> 
> This led me to thinking, though: the OS already had all the information
> needed to figure out where the problem was.  If there had been an
> explanation inside that function definition, FreeBSD could have
> automatically gone to the file, searched for that explanation, and told
> me why my system had probably crashed.

There's a "small" problem here - to validate something like this you
need an AI or at least an expert system. It's purely coincidence that
you found someone else with bad RAM crashing in the same function and
byitself it doesn't mean anything. The exact same failure could be
caused by almost any serious problem:

* bad CPU or overheating
* bad motherboard/bus
* compiler generating bad code
* simply, a code bug.

The next time someone reads about "crash in uma_zfree_internal" he could
have an overheated CPU and will spend days swapping and testing RAM :)

From the other side, bad RAM can manifest in practically infinite ways,
as you discovered before you hit uma_zfree_internal.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 258 bytes
Desc: OpenPGP digital signature
Url : http://lists.freebsd.org/pipermail/freebsd-current/attachments/20080830/bc43728e/signature.pgp


More information about the freebsd-current mailing list