deadlock in ata_queue_request()
Soren Schmidt
sos at spider.deepcore.dk
Thu Dec 25 01:07:50 PST 2003
It seems Bruce Evans wrote:
> ata_queue_request() sleeps in an interrupt handler here:
Yes, I have a local fix to help this, the sleep was originally left in to
make a backport to -stable easier (ie no mutexes), but this need to be
changed here. I'll get it committed asap, but it is hollidays and the
kids has alot of new toys :)
> % void
> % ata_queue_request(struct ata_request *request)
> % {
> % /* mark request as virgin (it might be a reused one) */
> % request->result = request->status = request->error = 0;
> % request->flags &= ~ATA_R_DONE;
> %
> % /* put request on the locked queue at the specified location */
> % mtx_lock(&request->device->channel->queue_mtx);
> % if (request->flags & ATA_R_AT_HEAD)
> % TAILQ_INSERT_HEAD(&request->device->channel->ata_queue, request, chain);
> % else
> % TAILQ_INSERT_TAIL(&request->device->channel->ata_queue, request, chain);
> % mtx_unlock(&request->device->channel->queue_mtx);
> %
> % /* should we skip start ? */
> % if (!(request->flags & ATA_R_SKIPSTART))
> % ata_start(request->device->channel);
> %
> % /* if this was a requeue op callback/sleep already setup */
> % if (request->flags & ATA_R_REQUEUE)
> % return;
> %
> % /* if this is not a callback and we havn't seen DONE yet -> sleep */
> % if (!request->callback) {
> % while (!(request->flags & ATA_R_DONE))
> % tsleep(request, PRIBIO, "atareq", hz/10);
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> % }
> % }
>
> when it is called from an interrupt handler. It is called from an interrupt
> handler as part of timeout processing:
>
> ...
> msleep(...)
> ata_queue_request(...)
> ata_via_family_setmode(...)
> ata_identify_devices(...)
> ata_reinit(...)
> ata_timeout(...)
> softclock(...)
> ithread_loop(...)
> ...
>
> The timeout was called here shortly after ad2 hung:
>
> Dec 25 12:28:27 besplex kernel: ad2: TIMEOUT - READ_DMA retrying (2 retries left)
> Dec 25 12:28:27 besplex kernel: ata1: resetting devices ..
> Dec 25 12:28:27 besplex kernel: ad2: FAILURE - already active DMA on this device
> Dec 25 12:28:27 besplex kernel: ad2: setting up DMA failed
>
> ATA_R_DONE was never set and wakeup_request() was never called either, so
> softclock() was deadlocked and tsleep() never returned.
>
> The system ran surprisingly well with softclock() deadlocked. ad0 worked
> and everything that didn't use timeouts worked. Examples of things that
> didn't work because they use timeouts:
> - syscons screen updates.
> - statistics in top and systat.
> - sleep 1 in shells.
> - mbmon (shows the status, then never repeats).
>
> I tried the following to recover:
> - call wakeup(request) using ddb. This worked, but ATA_R_DONE was never
> set so ata_queue_request() just looped.
> - also ignore the ATA_R_DONE check using ddb. This un-deadlocked
> softclock(), but ata1 remained wedged.
> - then call "atacontrol reinit 1". This partly worked:
>
> Dec 25 14:31:12 besplex kernel: ata1: resetting devices ..
> Dec 25 14:31:44 besplex kernel: ad2: WARNING - removed from configuration
> Dec 25 14:31:44 besplex kernel: done
>
> but the ata driver caused a null pointer panic an instant later.
>
> - ad2 didn't come back after a hard reset.
> - ad2 came back after a power cycle.
>
> This was on an undermydesktop. Problems resuming on laptops may be similar.
> The hardware may really be wedged. Then the software shouldn't make things
> worse by sleeping or spinning in the timeout handler.
>
> Bruce
>
-Søren
Yes I know it works under windows!!
More information about the freebsd-current
mailing list