cvs commit: src/sys/alpha/alpha support.s src/sys/i386/i386 swtch.s src/sys/kern kern_shutdown.c src/sys/sys systm.h

Wed Jan 21 01:45:39 PST 2004

On Tue, Jan 20, 2004 at 10:29:29AM -0800, Bill Paul wrote:
> 
> [abuse of __FILE__ and __LINE__ in panic()]
>  
> > This was rejected in all reviews.  It gives less information than
> > grepping the sources, at some cost (grep at least gives correct line
> > numbers when the sources don't quite match the binary).
> > 
> > Please back this out.
> 
> I have to second this.
>  
> > >   Ideally a traceback should be printed too, any takers ?
> > 
> > This can be obtained by running a debugger on the panic dump.  Line
> > numbers and source file names can also be printed by the debugger
> > if the executable has at least line numbers in its debugging info.
> 
> That's fine for developers who always run their systems with crash
> dumps enabled and posess sufficient debugger fu to analyze them. What
> about ordinary users who encounter a kernel panic due to a trap
> and need to report it? It would save wear and tear on everyone's
> nerves (_ESPECIALLY_ mine) if they could just send in a traceback
> rather than developers being forced to do the usual "go back and
> do nm ${KERNEL} and tell us what you see" exchange.

It seems that instead of _TRYING_ to simplify everyone's life and being
smarter than our users, we need to improve on our users' general
bug/panic reporting "culture".  Probably some in-depth section in the
Handbook would be useful.

> 
> [...]
> 
> > Verbose panic messages, and lots of code to print out values of variables
> > just before panicing, are another mistake.  Short panic messages were
> > good enough when debuggers were primitive or nonexistent.  Verbose panic
> > messages are even less needed now.
> 
> Ok, hang on just a minute.
> 
> Let's be clear here. Adding linenum/sourcefile info to all panic()
> calls is kind of pointless, I agree. (If you're too lazy to grep
> the source for the panic message, you're just no damn good.) However,

Agreed.

> there are times when printing the contents of some variables in
> the panic string is _INCREDIBLY_ _USEFUL_. Terse panic messages
> are annoying -- for that matter so are terse error messages in general.
> Error messages are meant to a) inform the user that they need to
> perform some action to correct a problem and b) alert a software
> developer to a potential bug and, hopefully, how to fix it. It is
> not enough to note that an error occured: you have to explain
> _WHY_ it occured.
> 
> Really bad panic message:
> 
> 	panic("mbuf is corrupt (but I'm not telling you exactly "
> 	    "what's corrupt about it because god forbid I should "
> 	    "make life easier for you)");
> 
> Good panic message:
> 
> 	panic("mbuf is corrupt: mbuf %p has m_len of %d and m_flags of 0x%x "
> 	    "but m_data is NULL", m, m->m_len, m->m_flags);
> 
> By showing the contents of the suspicious variable/structure, you give
> a developer some clue as to what was going on in the system when the 
> variable/structure was trashed. Maybe m_len is the size of a protocol
> header that points to a particular protocol module being suspect.
> Maybe m_flags has M_DONGS set which points to code that Alfred wrote.
> It may not make sense to dump out every single piece of available state,
> but not providing any state at all is just damn rude.
> 
> Could you determine this from the crash dump? Sure -- but do you have
> any idea how much of a pain in the ass that can be, especially when
> the crash occurs on a system that isn't yours? By default, crash
> dumps are as large as physical memory, and most systems now have at
> least 128MB of RAM. The average user does not know how to analyze
> a crash dump, which means even if they get one, they won't be able to
> do anything with it, except send it to me. That means I'll have people
> throwing 128MB files at me, which I can't even analyze unless I happen
> to have a system handy running _exactly_ the same kernel as they are.
> This is an amazing amount of trouble to go through to determine a
> couple of pieces of info that could just as easily have been printed
> on the console, where the user could copy them down and include them
> in an e-mail.
> 
> Note that I am not saying we should immediately go back throug the
> kernel source and verbosify every single panic() call we come across.
> I _am_ saying that developers should not use existing panic() messages
> as models for their own. I am also not going to let you lead them
> down that path, for that way lies Linux^Wmadness.
> 
> Anyway, to reiterate: I don't think the lineno/sourcefile additions
> to panic() are worth the bloat. Tracebacks on the other hand, would
> be worth their weight in gold bikesheds.

Exactly right.  Methinks that lineno/sourcefile is rather easily
obtainable, but the state of a disaster is a lot harder nut to crack,
unaided.

./danfe