Unkillable KSE threaded proc

Andrew Gallatin gallatin at cs.duke.edu
Fri Sep 17 04:47:39 PDT 2004


Julian Elischer writes:
 > John Baldwin wrote:
 > > On Thursday 16 September 2004 09:42 am, Andrew Gallatin wrote:
 > > 
 > >>Julian Elischer writes:
 > >> > Andrew, please try -current on ts own now..
 > >> > I have checked in some fixes that have helped others.
 > >>
 > >>OK, preemption off... Still a system lockup, but a little different.
 > >>
 > >>The interesting thing here is that continuing and breaking into the
 > >>debugger repeatedly seems to show that thread 0xc1646af0 is looping in
 > >>exit.  I've seen him in thread_single, thread_suspend_check, and in
 > >>exit itself at kern_exit.c:163, etc.  A breakpoint in
 > >>thread_suspend_one never triggers, so I guess he's holding the proc
 > >>lock and just looping forever.  A breakpoint in _mtx_assert() shows
 > >>him asserting the proc lock in thread_suspend_check at kern_thread.c:898.
 > >>Over and over.
 > > 
 > > 
 > > There is definitely some sort of infinite loop here.  Stripping out the 
 > > comments in exit1() for that section of code reveals basically:
 > > 
 > >         PROC_LOCK(p);
 > >         if (p->p_flag & P_HADTHREADS) {
 > > retry:
 > >                 thread_suspend_check(0);
 > >                 if (thread_single(SINGLE_EXIT))
 > >                         goto retry;
 > > 	}
 > >         p->p_flag |= P_WEXIT;
 > >         PROC_UNLOCK(p);
 > > 
 > > So it's easy to see how it can stuck in a loop I think.  If thread_single() 
 > > never drops the lock then other threads that are waiting to die can't 
 > > actually wait because they can never get the proc lock so that they can die.
 > > 
 > 
 > 
 > hmm intersting..
 > but this code hasn't changed in ages...
 > 
 > 
 > in thread_single we see:
 > 
 >                  thread_suspend_one(td);
 >                  PROC_UNLOCK(p);
 >                  mi_switch(SW_VOL, NULL);
 >                  mtx_unlock_spin(&sched_lock);
 >                  PROC_LOCK(p);
 >                  mtx_lock_spin(&sched_lock);
 > 
 > so when it sleeps it releases the proc lock.

But that's the problem.  As I said above, break in thread_suspend_one
never triggers, so this code is never called.  It must be bailing
out of thread_suspend_one() before this happens.

Did somebody fix ddb?  If yes, I can try stepping through it if you like.

Maybe a quick fix would be to drop the proc lock and tsleep for a
clock tick at the bottom of the infinate loop...

Drew


More information about the freebsd-threads mailing list