Repeated similar panics on -STABLE

Sun Apr 20 01:16:29 PDT 2003

On 20 Apr, Dmitry Sivachenko wrote:
> On Sat, Apr 19, 2003 at 03:15:20PM -0700, Don Lewis wrote:
>> Dmitry,
>> 
>> If you still have the core and kernel files, I'd appreciate it if you
>> could point gdb at them and print the following stuff from the malloc()
>> stack frame.
>> 
>> 	indx
>> 	&bucket[0]
>> 	kbp
>> 	*kpb
>> 	allocsize
>> 	npg
>> 	cp
>> 	freep
>> 	*freep
>> 	va
>> 
> 
> In backtrace I posted malloc was called twice.  Since I am not sure
> which one you are interested in, I am sending these values for both.

The second one, which is where the first trap occurred.

> (kgdb) up 6
> #6  0xc015daff in malloc (size=128, type=0xc02aca60, flags=2)
>     at /mnt/se3/releng_4/src/sys/kern/kern_malloc.c:243
> 243             va = kbp->kb_next;
> (kgdb) p indx
> $1 = 7
> (kgdb) p &bucket[0]
> $2 = (struct kmembuckets *) 0xc02bcee0
> (kgdb) p kbp
> $3 = (struct kmembuckets *) 0x5cdd8000
> (kgdb) p *kpb

> No symbol "kpb" in current context.
> (kgdb) p *kbp
> Cannot access memory at address 0x5cdd8000.
> (kgdb) p allocsize
> $4 = -730301488
> (kgdb) p npg
> $5 = 0
> (kgdb) p cp
> $6 = 0x0
> (kgdb) p freep
> $7 = (struct freelist *) 0x0
> (kgdb) p va
> $8 = 0x5cdd8000 <Address 0x5cdd8000 out of bounds>
> (kgdb)
> 
> 
> (kgdb) up 16
> #22 0xc015daff in malloc (size=72, type=0xc029fee0, flags=0)
>     at /mnt/se3/releng_4/src/sys/kern/kern_malloc.c:243
> 243             va = kbp->kb_next;
> (kgdb) p indx
> $9 = 7
> (kgdb) p &bucket[0]
> $10 = (struct kmembuckets *) 0xc02bcee0
> (kgdb) p kbp
> $11 = (struct kmembuckets *) 0x5cdd8000

That's not good ... if index is 7, and sizeof struct kmembuckets is 36,
I believe that
	kbp = &bucket[indx];
should assign 0xc02bcfdc to kbp.  It looks like kbp might be getting
stomped on somehow.

> (kgdb) p *kbp
> Cannot access memory at address 0x5cdd8000.

Yeah, kbp is pointing at a non-existent memory location.

> (kgdb) p allocsize
> $12 = -349729108
> (kgdb) p npg
> $13 = 0

Since allocsize is is garbage that doesn't show any relationship to
size, and since npg isn't consistent with either size of allocsize, it
looks to me like kbp->kb_next is initially non-NULL, so we're not
executing the "if" block.

        if (kbp->kb_next == NULL) {
                kbp->kb_last = NULL;
                if (size > MAXALLOCSAVE)
                        allocsize = roundup(size, PAGE_SIZE);
                else
                        allocsize = 1 << indx;
                npg = btoc(allocsize);
                va = (caddr_t) kmem_malloc(kmem_map, (vm_size_t)ctob(npg), flags
);

The other possibility is that these are also getting smashed somehow.

> (kgdb) p cp
> $14 = 0x0
> (kgdb) p freep
> $15 = (struct freelist *) 0x0

If we don't get into the "if" block, these will contain leftover
garbages from the stack.

> (kgdb) p va
> $16 = 0x5cdd8000 <Address 0x5cdd8000 out of bounds>

Interesting ... how does va get the same value as kbp?  Is it coming
from the
	va = (caddr_t) kmem_malloc()
assignment in the "if" block that we don't think we're executing, or the
	va = kbp->kb_next;
assignment that is causing the panic?

If kbp is pointing to a non-existent page, why does Terry's patch seem
to fix the problem for you?

I wonder if things are getting further munged after the trap occurs?
That would make it more difficult to track down the problem from the
core file.

Something else of interest to print is
	bucket[7]
bucket[7].kb_next and bucket[7].kb_last might shed some light.

Something interesting is that the fault address, and the values of va
and kbp are the same in both stack frames ...

One other question ... is your kernel compiled with INVARIANTS?  That
changes the definition of struct freelist.