Mozilla problems on current -- fix
John Baldwin
jhb at FreeBSD.org
Mon Mar 1 15:09:38 PST 2004
On Monday 01 March 2004 03:14 pm, Danny J. Zerkel wrote:
> Looks like John's sys/kern/kern_thread.c 1.171 broke Mozilla (and
> threading in general). He appears to have left out a test that
> the sleeping thread is interruptable in one spot:
>
> --- kern_thread.c.orig Mon Mar 1 15:08:01 2004
> +++ kern_thread.c Mon Mar 1 14:26:30 2004
> @@ -646,7 +646,8 @@
> } else if (TD_ON_SLEEPQ(td2) &&
> ((td2->td_wchan == &kg->kg_completed) ||
> (td2->td_wchan == &p->p_siglist &&
> - (ku->ku_mflags & KMF_WAITSIGEVENT)))) {
> + (ku->ku_mflags & KMF_WAITSIGEVENT))) &&
> + (td2->td_flags & TDF_SINTR)) {
> sleepq_abort(td2);
> } else {
> ku->ku_flags |= KUF_DOUPCALL;
>
> This change fixes my Mozilla hangs and other panics on current.
This should not matter. This is the only place that doesn't check TDF_SINTR,
true, but it should always be set in these cases. First, look in
kern_thread.c where we msleep() on these addresses (p_siglist and
kg_completed):
if (!(p->p_flag & P_SIGEVENT) && !(ku->ku_flags &
KUF_DOUPCALL))
error = msleep(&p->p_siglist, &p->p_mtx, PPAUSE|
PCATCH,
"ksesigwait", (uap->timeout ? tvtohz(&tv) : 0));
p->p_flag &= ~P_SIGEVENT;
Note that PCATCH is set.
error = msleep(&kg->kg_completed, &p->p_mtx,
PPAUSE|PCATCH, "kserel",
(uap->timeout ? tvtohz(&tv) : 0));
kg->kg_upsleeps--;
Again, PCATCH is set. Now onto kern_synch.c and msleep():
catch = priority & PCATCH;
...
sq = sleepq_lookup(ident);
...
sleepq_add(sq, ident, mtx, wmesg, 0);
if (timo)
sleepq_set_timeout(sq, ident, timo);
if (catch) {
sig = sleepq_catch_signals(ident);
if (sig == 0 && !TD_ON_SLEEPQ(td)) {
mtx_lock_spin(&sched_lock);
td->td_flags &= ~TDF_SINTR;
mtx_unlock_spin(&sched_lock);
catch = 0;
}
} else
sig = 0;
...
sleepq_wait_foo(ident);
Thus, we see that if PCATCH is set, we call sleepq_add() and then
sleepq_catch_signals(). Bah, I see. Only the sleep queue lock is atomic for
the whole operation and not sched_lock. While your patch might fix the
assertion for this KSE case, there might be a larger set or problems with the
race here, which is that sched_lock can be dropped in between td_wchan being
set and TDF_SINTR being set. I.e, if you get an abort in between the
sleepq_add() and sleepq_catch_signals() and just use sched_lock it will get
lost when it shouldn't. For signals, I think the call to cursig() will still
work ok. I'm not sure if sleepq_catch_signals() calls anything that checks
for KUF_DOUPCALL. Let me check. Ugh, this code is too hard to follow. I
have no idea if this is actually safe or not. :( I can commit the above as a
band-aid I guess. I don't know why the KSE code doesn't just use a wakeup
here, the sleeps are in top-level functions and not buried deep.
--
John Baldwin <jhb at FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve" = http://www.FreeBSD.org
More information about the freebsd-threads
mailing list