Repeated similar panics on -STABLE
Dmitry Sivachenko
demon at FreeBSD.org
Sun Apr 20 04:04:53 PDT 2003
On Sun, Apr 20, 2003 at 03:18:05AM -0700, Don Lewis wrote:
> On 20 Apr, Dmitry Sivachenko wrote:
> > On Sun, Apr 20, 2003 at 01:16:16AM -0700, Don Lewis wrote:
>
> >> If kbp is pointing to a non-existent page, why does Terry's patch seem
> >> to fix the problem for you?
> >
> > Well, here is probably a misunderstanding..
> > We did NOT apply Terry's patch. Let me quote a bit from my e-mail to Terry:
> >
> > TL> Did my patch fix your problem?
> > TL>
> > TL> Or did you tune your kernel, as I suggested, to fix your problem?
> > TL>
> > TL> Or is it still a problem?
> >
> > DS>We changed maxusers from 512 to 0 and decreased the number of
> > DS>NMBCLUSTERS. Now everything is working fine, but since these panics occured
> > DS>about once a week I can't say for sure they are completely gone.
> > DS>Let's wait at least one more week...
> >
> > Thus I wanted to say that we only tuned maxusers and NMBCLUSTERS. We
> > run virgin -STABLE kernel without any patches. Probably my english leaves much
> > to be desired ;-((
>
> Your English seems just fine to me.
>
> I just got the impression from Terry that the patch is what fixed the
> problem for you.
>
>
> >> I wonder if things are getting further munged after the trap occurs?
> >> That would make it more difficult to track down the problem from the
> >> core file.
> >>
> >> Something else of interest to print is
> >> bucket[7]
> >> bucket[7].kb_next and bucket[7].kb_last might shed some light.
> >>
> >
> > (kgdb) up 22
> > #22 0xc015daff in malloc (size=72, type=0xc029fee0, flags=0)
> > at /mnt/se3/releng_4/src/sys/kern/kern_malloc.c:243
> > 243 va = kbp->kb_next;
> > (kgdb) p bucket[7]
> > $1 = {kb_next = 0x5cdd8000 <Address 0x5cdd8000 out of bounds>,
> > kb_last = 0xc8fcb000 "", kb_calls = 2127276, kb_total = 4256,
> > kb_elmpercl = 32, kb_totalfree = 1264, kb_highwat = 160, kb_couldfree = 5497}
> > (kgdb) p bucket[7].kb_next
> > $2 = 0x5cdd8000 <Address 0x5cdd8000 out of bounds>
> > (kgdb) p bucket[7].kb_last
> > $3 = 0xc8fcb000 ""
> > (kgdb)
>
> That explains a quite a bit. The free list is somehow getting
> corrupted. That's why the 0x5cdd8000 value shows up in both stack
> frames. The value of kb_last looks ok, though. Because kb_next is not
> NULL, we skip the "if" block that allocates more memory and proceed to
> line 243. Gdb is lying a bit though, the trap isn't happening on line
> 243, va is just getting the garbage value there. The trap is actually
> happening on the next line when we try to dereference this garbage
> pointer:
> kbp->kb_next = ((struct freelist *)va)->next;
>
> It sure would be nice to know the source of this wierd value. It's
> obviously not a pointer, but it's not obvious to me what it might be.
>
> It sure looks to me like something is writing to memory that has already
> been put back on the free list and is stomping on the next pointer in
> one of the memory blocks on the list. When this block gets allocated
> again, malloc() does:
> va = kbp->kb_next;
> kbp->kb_next = ((struct freelist *)va)->next;
> and stores the garbage in kb_next, where we trip over it on the next
> allocation.
>
>
> >> One other question ... is your kernel compiled with INVARIANTS? That
> >> changes the definition of struct freelist.
> >
> > Without.
>
> This could potentially be difficult to track down. Probably the best
> bet is to go back to the previous configuration and compile with
> INVARIANTS and hope that this will catch the problem a bit closer to the
> source.
>
OK, I'll restore our previous configuration tomorrow and add INVARIANTS.
I'll let you know when a fresh crash dump will be ready.
More information about the freebsd-hackers
mailing list