System, diagnose thyself: auto-documentation for crashes
kirk at strauser.com
Fri Aug 29 15:14:02 UTC 2008
I was having flaky system problems that were driving me to
distraction. Yesterday, I finally got a panic message with an
instruction pointer, used addr2line to see that the failure was in
uma_zfree_internal, searched Google, and learned that it was probably
due to bad RAM. Half any hour later, memtest86 found the defective
stick and the problem was solved.
This led me to thinking, though: the OS already had all the
information needed to figure out where the problem was. If there had
been an explanation inside that function definition, FreeBSD could
have automatically gone to the file, searched for that explanation,
and told me why my system had probably crashed.
I propose that we:
1) Settle on a standard comment format for metainformation. There are
already standards like Doxygen if we didn't want to home-roll something.
2) Write a program that takes an instruction pointer and outputs the
comment for the associated function.
3) Modify /etc/rc.d/savecore to run the program from #2.
For instance, suppose the comments in sys/vm/uma_core.c looked like:
* Frees an item to an INTERNAL zone or allocates a free bucket
* zone The zone to free to
* item The item we're freeing
* udata User supplied data for the dtor
* skip Skip dtors and finis
* Failures in this function are commonly due to defective RAM.
uma_zfree_internal(uma_zone_t zone, void *item, void *udata,
enum zfreeskip skip, int flags)
If I'd seen that failure message in my syslog, I would have avoided a
few days of teeth gnashing. What do you think? I think something
like this could be extremely useful. Benefits:
- There would be zero impact on performance because it would only
touch comments and not any running code whatsoever.
- It would require minimal work.
- It could be done incrementally. Document known common failure
points and add others with time.
- It wouldn't affect any other systems.
More information about the freebsd-current