SIGSTOP and SIGKILL

Ravi Murty ravi.murty at gmail.com
Mon Mar 14 05:10:56 UTC 2011


Hi everybody,

I'm using FreeBSD 8.0 and I seem to have a race condition that is fairly
reproducible. Let me try and describe it.

The basic idea is that we use SIGSTOP and SIGCONT to stop and restart
threads of a process - call it p1. A caller (call it c1) SIGSTOPs and
SIGCONTs p1 until another caller (call it c2) decides to come along and kill
the process. Both callers grab proc_lock for p1 and use pfind(...) to find
the process before subjecting p1 to any of these signals. What I see is that
SIGKILL is somehow ignored in favor for SIGSTOP and process (and all of its
threads somehow end up suspended).

As a side note, we changed our implementation to "post" SIGKILL to all
threads of p1 because of another race we discovered. In this case the thread
selected by psignal/tdsignal happened to be in thr_exit() on its way to
dying. Becuse it was still on the list of available threads for the process,
it was picked (FIRST_TD_IN_PROC) but because it was in thr_exit it dies
taking SIGKILL with it.

What I see in this new race is the following. We post SIGKILL on every
thread of the process and c2 leaves releasing p2's proc_lock. As each thread
returns to ring3 via the trap handler it sees that it has a signal to deal
with and calls cursig and postsig. In the code, postsig eventually calls
sigexit (default behavior) which via exit1 calls thread_suspend_check
causing threads to kill themselves as long as the first thread that is here
calls thread_single(SINGLE_EXIT). In our case, the process (which is still
on the global all_proc list) is subjected to SIGSTOP which sets the
P_STOPPED_SIG flag to p1. As each thread makes its way through
thread_suspend_check it suspends itself becuase P_SHOULDSTOP ends up being
true. In the end I end up with a process whose threads have taken SIGKILL (I
can dump each threads state and look at its siglist to see no signals) but
the process hasn't died. This seems odd. It would seem that any signals
posted after the process receives a SIGKILL should be ignore but how do we
detect that specially after SIGKILL is cleared from the siglist because it
is in the middle of taking the signal. Alternatively if the signal being
taken is SIGKILL the kernel needs to avoid saying "I'll stop the process now
because I've been asked to".

Any good solutions to this problem?

Thanks
Ravi Murty


More information about the freebsd-hackers mailing list