debugging frequent kernel panics on 8.2-RELEASE
avg at FreeBSD.org
Mon Aug 15 08:31:45 UTC 2011
on 14/08/2011 17:43 Steven Hartland said the following:
> ----- Original Message ----- From: "Andriy Gapon" <avg at FreeBSD.org>
>> Maybe test it on couple of machines first just in case I overlooked something
>> essential, although I have a report from another use that the patch didn't break
>> anything for him (it was tested for an unrelated issue).
> We've got this running on a ~40 machines and just had the first panic
> since the update. Unfortunately it doesn't seem to have changed anything :(
> We have 352 thread entries starting with:-
> #0 sched_switch (td=0xffffffff8083e4e0, newtd=0xffffff0012d838c0,
> flags=Variable "flags" is not available.
> 23 with:-
> cpustop_handler () at atomic.h:285
> and 16 with:-
> #0 fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:562
I would like to get a full output of thread apply all bt.
> The main message being:-
> panic: double fault
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
> Unread portion of the kernel message buffer:
> <118>Aug 14 15:13:33 amsbld15 syslogd: exiting on signal 15
So this line, does it indicate a shutdown of a jail or of the whole system?
> Fatal double fault
> rip = 0xffffffff8053b691
Can you please provide output of 'list *0xffffffff8053b691' in kgdb?
> rsp = 0xffffff8d8f356fb0
> rbp = 0xffffff8d8f357210
> cpuid = 2; apic id = 02
> panic: double fault
> cpuid = 2
> KDB: stack backtrace:
> #0 0xffffffff803bb75e at kdb_backtrace+0x5e
> #1 0xffffffff8038956e at panic+0x2ae
> #2 0xffffffff805802b6 at dblfault_handler+0x96
> #3 0xffffffff8056900d at Xdblfault+0xad
I think (not 100% sure) that with DDB in kernel we could get a better backtrace
here, possibly with pre-dblfault stack frames, because DDB backend is a bit more
smarter than the trivial stack(9) printer.
> stack: 0xffffff8d8f357000, 4
One thing I can say is that this looks like like a double-fault because of stack
exhaustion (the most typical cause): rsp value is below td_kstack.
Can you please also provide the following information:
p *((struct pcb *)((char *)0xffffff8d8f357000 + KSTACK_PAGES * PAGE_SIZE) - 1)
where KSTACK_PAGES is a value of KSTACK_PAGES option (amd64 default is 4) and
PAGE_SIZE is 4096.
> rsp = 0xffffff800009ae10
> There are some indications that stopping jails could be the
> cause of the panics so on one test box I've added in invariants
> to see if we get anything shows up from that.
More information about the freebsd-stable