How to get anything useful out of kgdb?
Andriy Gapon
avg at FreeBSD.org
Fri Oct 2 21:30:28 UTC 2015
On 02/10/2015 19:12, John Baldwin wrote:
> On Friday, October 02, 2015 09:26:23 AM Andriy Gapon wrote:
>> On 15/05/2015 20:57, Ryan Stone wrote:
>>> *Sigh*, kgdb isn't unwinding the trap frame properly. You can try this to
>>> figure out where it was running:
>>
>> I wonder, what is a reason for this?
>> Can that be fixed in kgdb itself?
>> It seems that usually kgdb handles trap frames just fine, but not always?
>
> It should be fixable. If this doesn't work under newer kgdb let me know and I'll
> try to fix it.
Okay, letting you know :-)
The backtraces from the in-tree kgdb and the newer kgdb both abort at the same
frame (output from the newer kgdb is in my message in another kgdb related thread).
> I did fix a few edge cases with special frame handling in the
> newer kgdb though those mostly had to do with fork_trampoline and possibly
> Xtimerint (and aside from fork_trampoline I think the fixes were mostly for i386
> where different handlers setup trapframes differently)
>
>>> That gives you the top of the callstack at the time that the core was
>>> taken. To get the rest of it, try:
>>>
>>> define trace_stack
>>> set $frame_ptr=$arg0
>>> set $iters=0
>>> while $frame_ptr != 0 && $iters < $arg1
>>> set $ret_addr=((char*)$frame_ptr) + sizeof(void*)
>>> printf "frameptr=%p, ret_addr=%p\n", (void*)$frame_ptr, *(void**)$ret_addr
>>> printf " "
>>> info line **(void***)$ret_addr
>>> set $frame_ptr=*(void**)$frame_ptr
>>> set $iters=$iters+1
>>> end
>>> end
>>>
>>> trace_stack frame->tf_rbp 20
>>
>> Thank you for this script.
>> Here is an example from my practice.
>>
>> (kgdb) bt
>> #0 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:291
>> #1 0xffffffff8063453f in kern_reboot (howto=260) at
>> /usr/src/sys/kern/kern_shutdown.c:359
>> #2 0xffffffff80634ba4 in vpanic (fmt=<value optimized out>, ap=<value optimized
>> out>) at /usr/src/sys/kern/kern_shutdown.c:635
>> #3 0xffffffff806348a3 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:568
>> #4 0xffffffff8041bba7 in db_panic (addr=<value optimized out>, have_addr=false,
>> count=0, modif=0x0) at /usr/src/sys/ddb/db_command.c:473
>> #5 0xffffffff8041b67b in db_command (cmd_table=0x0) at
>> /usr/src/sys/ddb/db_command.c:440
>> #6 0xffffffff8041b524 in db_command_loop () at /usr/src/sys/ddb/db_command.c:493
>> #7 0xffffffff8041de0b in db_trap (type=<value optimized out>, code=0) at
>> /usr/src/sys/ddb/db_main.c:251
>> #8 0xffffffff80669de8 in kdb_trap (type=19, code=0, tf=0xffffffff80f976d0) at
>> /usr/src/sys/kern/subr_kdb.c:653
>> #9 0xffffffff80820d26 in trap (frame=0xffffffff80f976d0) at
>> /usr/src/sys/amd64/amd64/trap.c:381
>> #10 0xffffffff80809623 in nmi_calltrap () at
>> /usr/src/sys/libkern/explicit_bzero.c:28
>
> This may be part of the problem. The trapframe unwinder depends on function names
> to know when it is crossing a trapframe. nmi_calltrap() is not the function at
> explicit_bzero.c:28. Usually debugging this sort of thing starts by going to frame 11
> and comparing its registers with the values in the trapframe. They should match, but
> sometimes you will find them shifted by one or two, etc.
And it seems that nmi_calltrap being a label within an assembler-defined
procedure confuses the in-tree kgdb quite a lot:
(kgdb) list *0xffffffff80809623
0xffffffff80809623 is at /usr/src/sys/libkern/explicit_bzero.c:28.
23 void
24 explicit_bzero(void *buf, size_t len)
25 {
26 memset(buf, 0, len);
27 __explicit_bzero_hook(buf, len);
28 }
(kgdb) list nmi_calltrap
23 void
24 explicit_bzero(void *buf, size_t len)
25 {
26 memset(buf, 0, len);
27 __explicit_bzero_hook(buf, len);
28 }
(kgdb) disassemble nmi_calltrap
Dump of assembler code for function nmi_calltrap:
0xffffffff8080961b <nmi_calltrap+0>: mov %rsp,%rdi
0xffffffff8080961e <nmi_calltrap+3>: callq 0xffffffff80820670 <trap>
0xffffffff80809623 <nmi_calltrap+8>: test %ebx,%ebx
0xffffffff80809625 <nmi_calltrap+10>: je 0xffffffff80809695 <nocallchain>
0xffffffff80809627 <nmi_calltrap+12>: mov %gs:0x0,%rax
0xffffffff80809630 <nmi_calltrap+21>: or %rax,%rax
0xffffffff80809633 <nmi_calltrap+24>: je 0xffffffff80809695 <nocallchain>
0xffffffff80809635 <nmi_calltrap+26>: testl $0x400000,0xec(%rax)
0xffffffff8080963f <nmi_calltrap+36>: je 0xffffffff80809695 <nocallchain>
0xffffffff80809641 <nmi_calltrap+38>: mov %rsp,%rsi
0xffffffff80809644 <nmi_calltrap+41>: mov $0xc0,%rcx
0xffffffff8080964b <nmi_calltrap+48>: mov %gs:0x220,%rdx
0xffffffff80809654 <nmi_calltrap+57>: sub %rcx,%rdx
0xffffffff80809657 <nmi_calltrap+60>: mov %rdx,%rdi
0xffffffff8080965a <nmi_calltrap+63>: shr $0x3,%rcx
0xffffffff8080965e <nmi_calltrap+67>: cld
0xffffffff8080965f <nmi_calltrap+68>: rep movsq %ds:(%rsi),%es:(%rdi)
0xffffffff80809662 <nmi_calltrap+71>: mov %ss,%eax
0xffffffff80809664 <nmi_calltrap+73>: push %rax
0xffffffff80809665 <nmi_calltrap+74>: push %rdx
0xffffffff80809666 <nmi_calltrap+75>: pushfq
0xffffffff80809667 <nmi_calltrap+76>: mov %cs,%eax
0xffffffff80809669 <nmi_calltrap+78>: push %rax
0xffffffff8080966a <nmi_calltrap+79>: pushq $0xffffffff80809671
0xffffffff8080966f <nmi_calltrap+84>: iretq
End of assembler dump.
(kgdb) disassemble explicit_bzero
Dump of assembler code for function explicit_bzero:
0xffffffff806e74c0 <explicit_bzero+0>: push %rbp
0xffffffff806e74c1 <explicit_bzero+1>: mov %rsp,%rbp
0xffffffff806e74c4 <explicit_bzero+4>: push %r14
0xffffffff806e74c6 <explicit_bzero+6>: push %rbx
0xffffffff806e74c7 <explicit_bzero+7>: mov %rsi,%r14
0xffffffff806e74ca <explicit_bzero+10>: mov %rdi,%rbx
0xffffffff806e74cd <explicit_bzero+13>: callq 0xffffffff806e74f0 <memset>
0xffffffff806e74d2 <explicit_bzero+18>: mov %rbx,%rdi
0xffffffff806e74d5 <explicit_bzero+21>: mov %r14,%rsi
0xffffffff806e74d8 <explicit_bzero+24>: callq 0xffffffff8088a2d0
<__explicit_bzero_hook>
0xffffffff806e74dd <explicit_bzero+29>: pop %rbx
0xffffffff806e74de <explicit_bzero+30>: pop %r14
0xffffffff806e74e0 <explicit_bzero+32>: pop %rbp
0xffffffff806e74e1 <explicit_bzero+33>: retq
End of assembler dump.
The newer kgdb is smarter about this situation:
(kgdb) list *0xffffffff80809623
0xffffffff80809623 is at /usr/src/sys/amd64/amd64/exception.S:527.
522 * - Check if the thread requires a user call chain to be
523 * captured.
524 *
525 * We are still in NMI mode at this point.
526 */
527 testl %ebx,%ebx
528 jz nocallchain /* not from userspace */
529 movq PCPU(CURTHREAD),%rax
530 orq %rax,%rax /* curthread present? */
531 jz nocallchain
However, that does not seem to help with stack unwinding.
--
Andriy Gapon
More information about the freebsd-hackers
mailing list