an unkillable process and a patch
Garry Belka
garry at NetworkPhysics.COM
Mon Nov 14 20:00:04 PST 2005
We see, not too often, that a Java process hangs and can't be killed
even by SIGKILL.
Apparently, one of the process threads forks. fork1() in kernel attempts
to enter a single-threaded mode, but thread_single() fails to complete
and hangs waiting until all threads but proc-> p_singlethread are
suspended. One of the remaining threads is not suspended and has only
SLEEP flag set.
pid thread thid flags inhib pflags comm wchan
1982 0xcd150180 100351 00020c00 1 0088 java
<sched_switch+323>
mi_switch + 426 in section .text
thread_suspend_check + 298 in section .text
userret + 58 in section .text
fork_return + 18 in section .text
fork_exit + 102 in section .text
1982 0xce120c00 100948 00000c00 1 0880 java
<sched_switch+323>
mi_switch + 426 in section .text
thread_suspend_check + 298 in section .text
userret + 58 in section .text
ast + 844 in section .text
1982 0xcd740900 100616 00000808 2 0080 java sbwait cd557320
mi_switch + 426 in section .text (SLEEPING, not SUSPENDED)
sleepq_switch + 164 in section .text
sleepq_wait_sig + 12 in section .text
msleep + 566 in section .text
sbwait + 56 in section .text
soreceive + 572 in section .text
soo_read + 65 in section .text
dofileread + 173 in section .text
read + 59 in section .text
syscall + 551 in section .text
1982 0xc3ae7900 100906 00000808 1 0080 java mi_switch +
426 in section .text
sleepq_switch + 164 in section .text
sleepq_wait_sig + 12 in section .text
msleep + 566 in section .text
sbwait + 56 in section .text
soreceive + 572 in section .text
soo_read + 65 in section .text
dofileread + 173 in section .text
read + 59 in section .text
syscall + 551 in section .text
1982 0xcd719780 100605 00000c00 1 0880 java mi_switch +
426 in section .text
thread_suspend_check + 298 in section .text
userret + 58 in section .text
ast + 844 in section .text
1982 0xcd6d9000 100830 00000000 1 0880 java (p_singlethread)
mi_switch + 426 in section .text - line 355
thread_single + 497 in section .text - line 863
fork1 + 169 in section .text - line 257
fork + 24 in section .text
syscall + 551 in section .text
Signals in singlethread state are not really delivered, SIGKILL stays
with the first thread in the queue, and so we got a deadlock.
I think that we got into this state because the non-suspended thread was
running when singlethread was attempting to put every thread to sleep.
All threads were marked TDF_ASTPENDING. However, a bit later ast()
failed to deal correctly with a thread that had non-null td->td_mailbox.
sys/kern/subr_trap.c:ast()
if ((p->p_flag & P_SA) && (td->td_mailbox == NULL))
thread_user_enter(td);
Below is a tentative patch. It's for 5.4-stable but it seemsa to me that
the same problem should be in 6.0.
Any comments?
Best,
Garry
Index: kern/kern_thread.c
===================================================================
RCS file: /u1/Repo/FreeBSD/sys/kern/kern_thread.c,v
retrieving revision 1.3
diff -u -r1.3 kern_thread.c
--- kern/kern_thread.c 9 Jul 2005 01:27:18 -0000 1.3
+++ kern/kern_thread.c 15 Nov 2005 03:01:22 -0000
@@ -1001,6 +1001,18 @@
}
void
+thread_check_single_suspend(struct thread *td)
+{
+ struct proc *p = td->td_proc;
+
+ if (__predict_false(P_SHOULDSTOP(p))) {
+ PROC_LOCK(p);
+ thread_suspend_check(0);
+ PROC_UNLOCK(p);
+ }
+}
+
+void
thread_unsuspend_one(struct thread *td)
{
struct proc *p = td->td_proc;
Index: kern/subr_trap.c
===================================================================
RCS file: /u1/Repo/FreeBSD/sys/kern/subr_trap.c,v
retrieving revision 1.1.1.2
diff -u -r1.1.1.2 subr_trap.c
--- kern/subr_trap.c 8 Jul 2005 03:01:08 -0000 1.1.1.2
+++ kern/subr_trap.c 15 Nov 2005 03:01:23 -0000
@@ -171,6 +171,8 @@
if ((p->p_flag & P_SA) && (td->td_mailbox == NULL))
thread_user_enter(td);
+ else
+ thread_check_single_suspend(td);
/*
* This updates the p_sflag's for the checks below in one
* "atomic" operation with turning off the astpending flag.
Index: sys/proc.h
===================================================================
RCS file: /u1/Repo/FreeBSD/sys/sys/proc.h,v
retrieving revision 1.1.1.5
diff -u -r1.1.1.5 proc.h
--- sys/proc.h 8 Jul 2005 03:07:51 -0000 1.1.1.5
+++ sys/proc.h 15 Nov 2005 03:01:28 -0000
@@ -887,6 +887,7 @@
void ksegrp_unlink(struct ksegrp *kg);
void thread_signal_add(struct thread *td, int sig);
struct thread *thread_alloc(void);
+void thread_check_single_suspend(struct thread *td);
void thread_exit(void) __dead2;
int thread_export_context(struct thread *td, int willexit);
void thread_free(struct thread *td);
More information about the freebsd-stable
mailing list