odd issues with DDB vs GDB

Patrick Mahan pmahan at adaranet.com
Thu Sep 16 00:12:28 UTC 2010


All,

I am trying to debug a system hang occurring on my HP Proliant G6 running some of our
kernel software.  I am seeing that under certain test loads, the system will hang-up
complete, no keyboard, no console, etc.  I suspect it is some of the kernel code that
I have inherited that contains a lot of locking (lots of data structure, each having
their own mutex lock (sleepable)).

I rebuilt the kernel to include the following:

options KDB
options DDB
options GDB
options MUTEX_NOINLINE
options MUTEX_DEBUG
options WITNESS
options WITNESS_SKIPSPIN

options SW_WATCHDOG  # Enable to force us into the debugger on a hang

This places me in the kernel DDB debugger.  The backtrace show by DDB
makes a lot of sense, it is showing we are blocked in _mtx_lock_flags()+0x6f.

Great, so I go to enable GDB -

db> gdb
Step to enter the remote GDB backend.
db> s
$T0510:a6f86c80fff*";thread:186c0;#62
gdb kernel.debug
Current directory is ~/devel/pm_bz5486/FBSD80REL/amd64/obj/usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/MPATH/
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

(gdb) target remote 10.10.29.111:7028
Remote debugging using 10.10.29.111:7028

0xffffffff806cf8a6 in kdb_init () at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/kern/subr_kdb.c:361
warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.
warning: shared library handler failed to enable breakpoint

gdb>

So right away I am somewhat suspicious as it is showing me a completely different entry
point.

DDB showed

Tracing pid 0 tid 100032 td 0xffffff0002668390
breakpoint() at breakpoint+0x5
kdb_enter() at kdb_enter+0x52
watchdog_fire() at watchdog_fire+0xda
hardclock() at hardclock+0x73
lapic_handle_timer() at lapic_handle_timer+0x120
Xtimerint() at Xtimerint+0x8c

But GDB is showing the above.

A backtrace (bt) in GDB does not show the same stack signature.

I have attached the complete log for those who are interested.  Is there a reason for the wide
difference between DDB and GDB?  Am I invoking gdb incorrectly?

Thanks for the education, as always!

Patrick
-------------- next part --------------
Debugging a system hang.  Enabled watchdog(4) built kernel with KDB, DDB and
GDB.  I am trying to debug this via remote GDB but what DDB shows for a stack
trace and what GDB shows are two seperate animals.

External serial port setup with the following in /boot/loader.conf

console="comconsole vidconsole"
comconsole_speed=9600
hint.uart.0.flags="0x90"

Serial is accessed via a cyclades ACS console server.  'telnet 10.10.29.111 70XX' where XX is the physical port number.

System comes up fine, testing is initiated, eventually the system hangs and
the watchdog fires dropping us into DDB -

DDB output

db> trace
Tracing pid 0 tid 100032 td 0xffffff0002668390
breakpoint() at breakpoint+0x5
kdb_enter() at kdb_enter+0x52
watchdog_fire() at watchdog_fire+0xda
hardclock() at hardclock+0x73
lapic_handle_timer() at lapic_handle_timer+0x120
Xtimerint() at Xtimerint+0x8c
--- interrupt, rip = 0xffffffff80688532, rsp = 0xffffff800011e460, rbp = 0xffffff800011e4c0 ---
_mtx_lock_sleep() at _mtx_lock_sleep+0x92
_mtx_lock_flags() at _mtx_lock_flags+0x6f
VCDgetWithIIFremote() at VCDgetWithIIFremote+0x3f
ProcessDataPkt() at ProcessDataPkt+0x3dc
ip_input() at ip_input+0xa24
netisr_dispatch_src() at netisr_dispatch_src+0xe3
netisr_dispatch() at netisr_dispatch+0x20
gif_input() at gif_input+0x324
in_gif_input() at in_gif_input+0x28f
encap4_input() at encap4_input+0x1b8
ip_input() at ip_input+0xd1a
netisr_dispatch_src() at netisr_dispatch_src+0xe3
netisr_dispatch() at netisr_dispatch+0x20
ether_demux() at ether_demux+0x1f3
ether_input() at ether_input+0x4ab
em_rxeof() at em_rxeof+0x410
em_handle_que() at em_handle_que+0x6f
taskqueue_run() at taskqueue_run+0xbb
taskqueue_thread_loop() at taskqueue_thread_loop+0x33
fork_exit() at fork_exit+0xba
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffff800011ed30, rbp = 0 ---

db>gdb
Step to enter the remote GDB backend.
db>s
^]<enter>
telnet> quit
#
# Enter the debugger via remote gdb
#
gdb kernel.debug
Current directory is ~/devel/pm_bz5486/FBSD80REL/amd64/obj/usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/MPATH/
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
(gdb) target remote 10.10.29.111:7028
Remote debugging using 10.10.29.111:7028
0xffffffff806cf8a6 in kdb_init () at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/kern/subr_kdb.c:361
warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.
warning: shared library handler failed to enable breakpoint
(gdb) bt
#0  0xffffffff806cf8a6 in kdb_init () at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/kern/subr_kdb.c:361
#1  0xffffffff8064c4da in _cv_wait (cvp=0xffffff800011e340, lock=0xffffffff80a9cd1d) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/kern/kern_condvar.c:102
#2  0xffffffff8064bd33 in tvtohz (tv=0x2668390) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/kern/kern_clock.c:371
#3  0xffffffff80988cf0 in lapic_handle_timer (frame=0xffffff800011e3b0) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/amd64/amd64/local_apic.c:792
#4  0xffffffff809816ac in Xinvlpg () at apic_vector.S:146
#5  0xffffff0107bff3a0 in ?? ()
#6  0xffffff0107bff3a0 in ?? ()
#7  0x0000000000000004 in ?? ()
#8  0xffffff0002668390 in ?? ()
#9  0x0000000000000943 in ?? ()
#10 0xffffff800011e5e4 in ?? ()
#11 0x0000000000000004 in ?? ()
#12 0xffffff0002668000 in ?? ()
#13 0xffffff800011e4c0 in ?? ()
#14 0x000000000afe0014 in ?? ()
#15 0x0000000000000006 in ?? ()
#16 0xffffffff806dfd30 in taskqueue_thread_loop (arg=0xffffff0002668000) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/kern/subr_taskqueue.c:359
#17 0xffffffff8068811f in atomic_cmpset_long (dst=0x7bff300, exp=0xffffffff80a9bc70, src=0x9430011e530) at atomic.h:158
#18 0xffffffff8063b6cf in VAagingTimer (dummy=0xffffff0107bff388) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/ipr/virtual_circuits.c:2389
#19 0xffffffff8062998c in ProcessDataPkt (socklyr=0x0, iif=0xffffff010798de00, protocol=0x6, src_addr={s_addr = 0xafe0014}, dst_addr={s_addr = 0xafa001b}, src_port=0x1f90, dst_port=0x402, tcp_flags=0x12, pkt=0xffffff0003ad8700) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/ipr/mpvc_forward.c:227
#20 0xffffffff807aa6c4 in ip_input (m=0xffffff0003ad8700) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/netinet/ip_input.c:1032
#21 0xffffffff80778d43 in netisr_dispatch_src (proto=0x1, source=0x0, m=0xffffff0003ad8700) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/net/netisr.c:934
#22 0xffffffff80779060 in netisr_start_swi (cpuid=0xffffff00, pc=0xffffffff8104eee0) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/net/netisr.c:1034
#23 0xffffffff8076ff14 in gif_ioctl (ifp=0xffffff00026bd800, cmd=0x20011e790, data=0xffffffff8076ff14 "ÉÃfff\220ff\220ff\220UH\211åH\201ì\220") at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/net/if_gif.c:694
#24 0xffffffff8079af8f in gif_validate4 (ip=0xffffffff807a67f4, sc=0xffffff0003ad8700, ifp=0x1449ba01c0) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/netinet/in_gif.c:396
#25 0xffffffff807a5c38 in encap6_input (mp=0xffffff0002668390, offp=0x1400000002, proto=0x4) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/netinet/ip_encap.c:206
#26 0xffffffff807aa9ba in __bswap16 (_x=0x0) at endian.h:135
#27 0xffffffff80778d43 in netisr_dispatch_src (proto=0x1, source=0x0, m=0xffffff0003ad8700) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/net/netisr.c:934
#28 0xffffffff80779060 in netisr_start_swi (cpuid=0xffffffff, pc=0xffffff800011ea10) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/net/netisr.c:1034
#29 0xffffffff8076bc83 in ether_demux (ifp=0xffffff00026f6800, m=0xffffff0003ad8700) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/net/if_ethersubr.c:911
#30 0xffffffff8076ba4b in ether_demux (ifp=0xffffff0003ad8700, m=0xffffff800011ead0) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/net/if_ethersubr.c:778
#31 0xffffffff8038aa70 in em_rxeof (rxr=0xffffff0002719c00, count=0x63, done=0x0) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/dev/e1000/if_em.c:4188
#32 0xffffffff8038360f in em_handle_que (context=0xffffff80003fc000, pending=0x1) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/dev/e1000/if_em.c:1451
#33 0xffffffff806df78b in taskqueue_drain (queue=0xffffff80004006e0, task=0x100000001) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/kern/subr_taskqueue.c:256
#34 0xffffffff806dfd63 in taskqueue_thread_loop () at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/kern/subr_taskqueue.c:375
#35 0x0000034380d52f40 in ?? ()
#36 0xffffff80004006e0 in ?? ()
#37 0xffffff0002711c00 in ?? ()
#38 0xffffff80004006e0 in ?? ()
#39 0xffffff800011ec70 in ?? ()
#40 0xffffffff8066b08a in fork_exit (callout=0xffffffff806df78b <taskqueue_drain+11>, arg=0xffffff800011ebc0, frame=0xffffff0002711c00) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/kern/kern_fork.c:856
Previous frame identical to this frame (corrupt stack?)

I also did an "info threads" (output omitted)

Here is thread 100032 as gdb sees it.

  392 Thread 100032  0xffffffff806cf8a6 in kdb_init () at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/kern/subr_kdb.c:361

while ddb saw

Tracing pid 0 tid 100032 td 0xffffff0002668390

Why can I not see the stack correctly in gdb?


More information about the freebsd-hackers mailing list