Yet another crash in FreeBSD 5.1

Terry Lambert tlambert2 at mindspring.com
Sat Aug 2 18:37:24 PDT 2003


Greg 'groggy' Lehey wrote:
> > The information I gave him gets him to lines of source code, instead
> > of just function names with strange hexadecimal numbers that resolve
> > to instruction offsets that may be specific to his compile flags,
> > date of checkout of the sources from CVS, etc..
> 
> The first step of the link above does the same thing.  But it's only
> the first step.

No, it does not.  The first step of your debugging link does
not deal with anything but having a vmcore lying around *which
he does not have*.


> Terry, why don't you come to my debug tutorial at the BSDCon next
> month?  I'll show you how to do this properly.  I'm not asking for
> people to interpret hex.  I'm asking for people, you included, to find
> out what debugging help is available.

I might do this; it depends on whether things die down at work
by then, or not.  Currently, though, I'm really busy fixing bugs
exatly like this one.  In the past 3 weeks, I've fixed 61 of them,
which average out to 4 a day.

> > If it's a NULL pointer dereference, the place to find it is by
> > turning on what debugging there is, and, if that fails, which it
> > probably will,
> 
> No, that will find the null pointer dereference pretty quickly.

You'd hope the entirety of the kernel were that well instrumented...


> > by eyeballing the lines of source code in question and understanding
> > the code around it well enough that you can tell *how* a pointer
> > there could be NULL.  My instructions *get* him those lines of
> > source.
> 
> You obviously still haven't read the reference.  Do that first, and
> come back when you have either understood things or are having
> difficulty understanding.  But don't shoot off your mouth without
> knowing what's going on.

I read the reference.

How does it apply in cases like this one, where you don't have a
vmcore file?


> > If you'll notice from his followup posting of the source in
> > question, Vinum is loaded as a module, and it's the FreeBSD code
> > that Vinum calls, not Vinum, that's causing the crash.
> 
> The bug is almost certainly in Vinum.

Most likely; I think that it's passing a bad argument to the
inferior function.  The way I would approach finding this, with
only:

1)	The line of code where the failure occurred
2)	The stack traceback, with no arguments
3)	The sources for the code in the stack traceback

would be to eyeball the code in #1, and try to figure out how
I gould get to that point with that pointer having a NULL value,
given my apriori knowledge of the forward call graph.

I would examine every intermediate conditional and function call
that could effect the value of the pointer and cause it to be
NULL at the point in question.


> This has nothing to do with being paranoid about babies.  This has to
> do with people shooting off their mouths in a public forum without
> bothering to check details first.

It's really hard to talk to you about Vinum.

One of the details I wish you would check is whether or not he
has a vmcore file, or the ability to get one...

-- Terry


More information about the freebsd-current mailing list