RELENG_6: I/O deadlock under load
Christian S.J. Peron
csjp at freebsd.org
Sat Oct 28 19:15:25 UTC 2006
Sorry, I forgot to include the chunk of code from the gmirror worker
thread which made me suspect this could be the problem:
[..]
/* Get first request from the queue. */
mtx_lock(&sc->sc_queue_mtx);
bp = bioq_first(&sc->sc_queue);
if (bp == NULL) {
if ((sc->sc_flags &
G_MIRROR_DEVICE_FLAG_DESTROY) != 0) {
mtx_unlock(&sc->sc_queue_mtx);
if (g_mirror_try_destroy(sc)) {
curthread->td_pflags &= ~TDP_GEOM;
G_MIRROR_DEBUG(1, "Thread
exiting.");
kthread_exit(0);
}
mtx_lock(&sc->sc_queue_mtx);
}
sx_xunlock(&sc->sc_lock);
/*
* XXX: We can miss an event here, because an event
* can be added without sx-device-lock and
without
* mtx-queue-lock. Maybe I should just stop
using
* dedicated mutex for events
synchronization and
* stick with the queue lock?
* The event will hang here until next I/O
request
* or next event is received.
*/
MSLEEP(sc, &sc->sc_queue_mtx, PRIBIO | PDROP,
"m:w1",
timeout * hz);
sx_xlock(&sc->sc_lock);
G_MIRROR_DEBUG(5, "%s: I'm here 4.", __func__);
continue;
}
bioq_remove(&sc->sc_queue, bp);
mtx_unlock(&sc->sc_queue_mtx);
Christian S.J. Peron wrote:
>
> It almost looks as if a user frequently runs gmirror(8) to query the
> status of their array. Under a high load situation, the worker is
> busy, so at one un-lucky momment, gmirror(8) is run:
>
> (1) gmirror(8) waits for sc->sc_lock owned by the worker
> (2) The worker then drops the lock
> (3) gmirror(8) proceeds
> (4) Worker wakes up and waits for sc->sc_lock
> (5) Only gmirror never will because it's waiting on a resource
> (presumably owned by the worker thread)?
>
> I am not certain this is correct, so I have included pjd in the CC
> loop, hoping he can help shed some light on the subject :)
>
>
>
More information about the freebsd-stable
mailing list