RELENG_6 panic under heavy load
John Baldwin
jhb at freebsd.org
Wed Dec 6 09:10:02 PST 2006
On Thursday 16 November 2006 11:09, Gleb Smirnoff wrote:
> On Thu, Nov 16, 2006 at 02:15:25PM +0300, Gleb Smirnoff wrote:
> T> On Thu, Nov 16, 2006 at 01:24:36PM +0300, Gleb Smirnoff wrote:
> T> T> I wonder why UMA was suspected to be the problem. Dima gave
> T> T> me access to the core. Here are more details from the trace:
>
> And even more:
>
> (kgdb) thread 133
> [Switching to thread 133 (Thread 100147)]#0 sched_switch (td=0xd745c900,
newtd=0xd51f7a80, flags=2) at /usr/src/sys/kern/sched_4bsd.c:980
> 980 sched_lock.mtx_lock = (uintptr_t)td;
> (kgdb) frame 9
> #9 0xd07a6e16 in syscall (frame=
> {tf_fs = 134938683, tf_es = 59, tf_ds = -809566149, tf_edi =
134997504, tf_esi = 134998528, tf_ebp = -813707944, tf_isp = -170046108,
tf_ebx = 672261300, tf_edx = 0, tf_ecx = 134969072, tf_eax = 1, tf_trapno =
0, tf_err = 2, tf_eip = 672832335, tf_cs = 51, tf_eflags = 646, tf_esp
= -813707972, tf_ss = 59})
> at /usr/src/sys/i386/i386/trap.c:1034
> 1034 userret(td, &frame, sticks);
> (kgdb) p *callp
> $92 = {sy_narg = 65539, sy_call = 0xd0630550 <poll>, sy_auevent = 43012}
>
> (kgdb) set $poll = (struct thread *)0xd745c900
> (kgdb) set $fork = (struct thread *)0xd59aad80
>
> (kgdb) p $poll->td_state
> $93 = TDS_INHIBITED
> (kgdb) p $poll->td_inhibitors
> $94 = 1 == TDI_SUSPENDED
> (kgdb) p/x $poll->td_flags
> $96 = 0x1010c01 == TDF_BORROWING | TDF_BOUNDARY | TDF_ASTPENDING |
TDF_NEEDRESCHED | TDF_SCHED0
> (kgdb) p $fork->td_state
> $97 = TDS_INHIBITED
> (kgdb) p $fork->td_inhibitors
> $98 = 8 == TDI_LOCK
> (kgdb) p/x $fork->td_flags
> $99 = 0x1000000 == TDF_SCHED0
>
> Not everything clear yet, but looks like:
>
> 1) $fork thread obtains proc lock
> 2) $poll thread blocks on proc lock
> 3) $fork thread has suspended the $poll thread in thread_single()
> 4) $fork thread temporarily unlocks proc lock (line 821) and is
> preempted by $poll thread
> 5) $poll thread obtains proc lock, and starts doing its poll job
> 6) $fork thread blocks on proc lock, and is added to its turnstile
> 7) $poll thread drops the proc lock, but isn't preempted by $fork
> 8) $poll thread exits and is preempted by $fork
>
> ...) and here is something difficult to understand, when $poll tries to
> make $fork runnable, while $fork is trying to put itself in the turnstile
> that is owned by $poll
Hmm. I'm guessing the problem is the $poll thread is suspended (not exited)
while holding the proc lock? That would appear to be the problem. That
thread can't run again to release the lock. Ah, yes, I see the bug.
Something like this should fix it:
Index: kern_thread.c
===================================================================
RCS file: /usr/cvs/src/sys/kern/kern_thread.c,v
retrieving revision 1.216.2.6
diff -u -r1.216.2.6 kern_thread.c
--- kern_thread.c 2 Sep 2006 17:29:57 -0000 1.216.2.6
+++ kern_thread.c 6 Dec 2006 17:06:26 -0000
@@ -969,7 +969,9 @@
TAILQ_REMOVE(&p->p_suspended, td, td_runq);
TD_CLR_SUSPENDED(td);
p->p_suspcount--;
+ critical_enter();
setrunnable(td);
+ critical_exit();
}
/*
What this does is force setrunnable() to be in a nested critical section so we
won't preempt during setrunnable() until either the caller of
thread_unsuspend_one() eventually releases sched_lock, or, in the case you
ran into, the thread does a PROC_UNLOCK() and calls mi_switch().
--
John Baldwin
More information about the freebsd-stable
mailing list