[BUG] I think sleepqueue need to be protected in sleepq_broadcast

John Baldwin jhb at freebsd.org
Mon Sep 8 20:13:05 UTC 2008


On Tuesday 02 September 2008 09:40:49 pm Peter Wemm wrote:
> On Tue, Sep 2, 2008 at 1:08 PM, John Baldwin <jhb at freebsd.org> wrote:
> > On Sunday 31 August 2008 09:31:17 pm Tor Egge wrote:
> >> sleepq_resume_thread() contains an ownership handover of sq if the
> >> resumed thread is the last one blocked on the wait channel.  After the
> >> handover, sq
> >
> > is
> >
> >> no longer protected by the sleep queue chain lock and should no longer
> >> be accessed by sleepq_broadcast().
> >>
> >> Normally, when sleepq_broadcast() incorrectly accesses sq after the
> >
> > handover,
> >
> >> it will find the sq->sq_blocked queue to be empty, and the code appears
> >> to work.
> >>
> >> If the last correctly woken thread manages to go to sleep again very
> >> quickly
> >
> > on
> >
> >> another wait channel, sleepq_broadcast() might incorrectly determine
> >> that
> >
> > the
> >
> >> sq->sq_blocked queue isn't empty, and start doing the wrong thing.
> >
> > So disregard my earlier e-mail.  Here is a simple fix for the sleepq
> > case:
> >
> > Index: subr_sleepqueue.c
> > ===================================================================
> > --- subr_sleepqueue.c   (revision 182679)
> > +++ subr_sleepqueue.c   (working copy)
> > @@ -779,7 +779,7 @@
> >  sleepq_broadcast(void *wchan, int flags, int pri, int queue)
> >  {
> >        struct sleepqueue *sq;
> > -       struct thread *td;
> > +       struct thread *td, *tdn;
> >        int wakeup_swapper;
> >
> >        CTR2(KTR_PROC, "sleepq_broadcast(%p, %d)", wchan, flags);
> > @@ -793,8 +793,7 @@
> >
> >        /* Resume all blocked threads on the sleep queue. */
> >        wakeup_swapper = 0;
> > -       while (!TAILQ_EMPTY(&sq->sq_blocked[queue])) {
> > -               td = TAILQ_FIRST(&sq->sq_blocked[queue]);
> > +       TAILQ_FOREACH_SAFE(td, &sq->sq_blocked[queue], td_slpq, tdn) {
> >                thread_lock(td);
> >                if (sleepq_resume_thread(sq, td, pri))
> >                        wakeup_swapper = 1;
> >
> > This only uses 'sq' to fetch the head of the queue once up front.  It
> > won't use it again once it has started waking up threads.
>
> I don't know if it is the same problem, but mx2.freebsd.org, running
> today's 6.4-PRERELEASE just died with:
> Sep  3 00:20:11 mx2 sshd[15333]: fatal: Read from socket failed: Connection
> resr panic: Assertion td->td_flags & TDF_SINTR failed at
> ../../../kern/subr_sleepque5 cpuid = 2
> KDB: enter: panic
> FreeBSD 6.4-PRERELEASE #7: Tue Sep  2 19:43:27 UTC 2008
> This was after about 3 hours of uptime.  It has previously run happily
> for months at a time before today's rebuild.

So I think what happened is that the thread was woken up while the sleepq 
chain was unlocked while the thread unlocks the sx lock.  The code handles 
this fine already since the same race can happen when dropping the lock while 
checking for signals.  However, in this case TDF_SINTR won't be true anymore.  
The assertion just needs to be updated.  Try this:

Index: subr_sleepqueue.c
===================================================================
--- subr_sleepqueue.c	(revision 182874)
+++ subr_sleepqueue.c	(working copy)
@@ -382,7 +382,7 @@
 	CTR3(KTR_PROC, "sleepq catching signals: thread %p (pid %ld, %s)",
 		(void *)td, (long)p->p_pid, p->p_comm);
 
-	MPASS(td->td_flags & TDF_SINTR);
+	MPASS((td->td_sleepqueue != NULL) ^ (td->td_flags & TDF_SINTR));
 	mtx_unlock_spin(&sc->sc_lock);
 
 	/* See if there are any pending signals for this thread. */


-- 
John Baldwin


More information about the freebsd-current mailing list