Andrew Gallatin gallatin at cs.duke.edu
Thu Sep 16 06:42:33 PDT 2004

Julian Elischer writes:
 > Andrew, please try -current on ts own now..
 > I have checked in some fixes that have helped others.

OK, preemption off... Still a system lockup, but a little different.

The interesting thing here is that continuing and breaking into the
debugger repeatedly seems to show that thread 0xc1646af0 is looping in
exit.  I've seen him in thread_single, thread_suspend_check, and in
exit itself at kern_exit.c:163, etc.  A breakpoint in
thread_suspend_one never triggers, so I guess he's holding the proc
lock and just looping forever.  A breakpoint in _mtx_assert() shows
him asserting the proc lock in thread_suspend_check at kern_thread.c:898.
Over and over.

I don't know how to figure out where the other cpu-bound thread is.  A
ktrace does not show it bouncing around in our driver's ioctl handler.
If you have a KTR mask you think might be helpful, I'd be happy to
build a ktr kernel to try to get more info from the thread on CPU1.


[halt - sent]
KDB: enter: Line break on console
[thread 100097]
Stopped at      kdb_enter+0x30: leave
db> sho pcpu
cpuid        = 0
curthread    = 0xc1646af0: pid 575 "mx_pingpong"
curpcb       = 0xe52ceda0
fpcurthread  = none
idlethread   = 0xc1561640: pid 12 "idle: cpu0"
APIC ID      = 0
currentldt   = 0x30
db> tr
kdb_enter(c066f1a0,c063158a,a0,c16f3140,e52ceba8) at kdb_enter+0x30
siointr1(c1637800,0,c066ef68,6ad,e52ceb90) at siointr1+0xd1
siointr(c1637800,c06a18c0,c065cd10,e52ceb9c,4) at siointr+0x35
intr_execute_handlers(c1556e90,e52ceba8,e52cec08,c061bf03,34) at intr_execute_handlers+0xb8
lapic_handle_intr(34) at lapic_handle_intr+0x3b
Xapic_isr1() at Xapic_isr1+0x33
--- interrupt, eip = 0xc04cd58d, esp = 0xe52cebec, ebp = 0xe52cec08 ---
_mtx_assert(c186de6c,1,c065cd10,382,c186de00) at _mtx_assert+0xc
thread_suspend_check(0,0,c0659712,88,e52cec68) at thread_suspend_check+0x59
exit1(c1646af0,9,c065c326,996,1) at exit1+0xc9
expand_name(c1646af0,9,c065c326,928,0) at expand_name
postsig(9,0,c065ef8f,100,1020800) at postsig+0x1e0
ast(e52ced48) at ast+0x46e
doreti_ast() at doreti_ast+0x17
db> ps
  pid   proc     uarea   uid  ppid  pgrp  flag   stat  wmesg    wchan  cmd
  575 c186de00 e6772000 1387     1   573 000c482 (threaded)  mx_pingpong
   thread 0xc1646af0 ksegrp 0xc1871070 [CPU 0]
   thread 0xc1646c80 ksegrp 0xc1871070 [SUSP]
   thread 0xc1646e10 ksegrp 0xc1871070 [RUNQ]
   thread 0xc1648000 ksegrp 0xc15ba230 [CPU 1]
db> call db_trace_thread(0xc1646c80, 10)
sched_switch(c1646c80,c1646af0,1,11d,a273455a) at sched_switch+0x16e
mi_switch(1,c1646af0,c065cd10,335,c186de6c) at mi_switch+0x2ad
thread_single(1,0,c0659712,88,67e8ac52) at thread_single+0x1d7
exit1(c1646c80,9,c065c326,996,1) at exit1+0xd5
expand_name(c1646c80,9,c065c326,928,0) at expand_name
postsig(9,0,c065ef8f,100,1020800) at postsig+0x1e0
ast(e52d1d48) at ast+0x46e
doreti_ast() at doreti_ast+0x17
db> call db_trace_thread(0xc1646e10, 10)
sched_switch(c1646e10,0,2,117,8da55b4a) at sched_switch+0x16e
mi_switch(2,0,c065ef8f,f5,1010000) at mi_switch+0x2ad
ast(e52d4d48) at ast+0x3c1
doreti_ast() at doreti_ast+0x17
db> call db_trace_thread(0xc1648000, 10)
sched_switch(18e,3a99,c15ba230,1e,0) at sched_switch+0x16e
__func__.0() at __func__.0+0xacd5

