RELENG_6: I/O deadlock under load

Christian S.J. Peron csjp at freebsd.org
Sat Oct 28 19:15:25 UTC 2006


Sorry, I forgot to include the chunk of code from the gmirror worker 
thread which made me suspect this could be the problem:

[..]

                /* Get first request from the queue. */
                mtx_lock(&sc->sc_queue_mtx);
                bp = bioq_first(&sc->sc_queue);
                if (bp == NULL) {
                        if ((sc->sc_flags &
                            G_MIRROR_DEVICE_FLAG_DESTROY) != 0) {
                                mtx_unlock(&sc->sc_queue_mtx);
                                if (g_mirror_try_destroy(sc)) {
                                        curthread->td_pflags &= ~TDP_GEOM;
                                        G_MIRROR_DEBUG(1, "Thread 
exiting.");
                                        kthread_exit(0);
                                }
                                mtx_lock(&sc->sc_queue_mtx);
                        }
                        sx_xunlock(&sc->sc_lock);
                        /*
                         * XXX: We can miss an event here, because an event
                         *      can be added without sx-device-lock and 
without
                         *      mtx-queue-lock. Maybe I should just stop 
using
                         *      dedicated mutex for events 
synchronization and
                         *      stick with the queue lock?
                         *      The event will hang here until next I/O 
request
                         *      or next event is received.
                         */
                        MSLEEP(sc, &sc->sc_queue_mtx, PRIBIO | PDROP, 
"m:w1",
                            timeout * hz);
                        sx_xlock(&sc->sc_lock);
                        G_MIRROR_DEBUG(5, "%s: I'm here 4.", __func__);
                        continue;
                }
                bioq_remove(&sc->sc_queue, bp);
                mtx_unlock(&sc->sc_queue_mtx);

Christian S.J. Peron wrote:
>
> It almost looks as if a user frequently runs gmirror(8) to query the 
> status of their array. Under a high load situation, the worker is 
> busy, so at one un-lucky momment, gmirror(8) is run:
>
>    (1) gmirror(8) waits for sc->sc_lock owned by the worker
>    (2) The worker then drops the lock
>    (3) gmirror(8) proceeds
>    (4) Worker wakes up and waits for sc->sc_lock
>    (5) Only gmirror  never will because it's waiting on a resource 
> (presumably owned by the worker thread)?
>
> I am not certain this is correct, so I have included pjd in the CC 
> loop, hoping he can help shed some light on the subject :)
>
>
>



More information about the freebsd-stable mailing list