Tape drive needs a BUS_DEVICE_RESET but isn't getting one

Sun Dec 13 16:21:58 PST 1998

Randy Gobbel wrote:
> 
> I'm now joining the ranks of the people having trouble with tape drives.  I
> downloaded the shareware version of Arkeia.  Looks very nice, but it causes my
> tape drive to hang so hard that only a reboot brings it back to life.  The
> application appears to be giving some sort of Write command, which times out
> (after a very long time).  This triggers an Abort, but that's not enough--the
> device is still hosed.  I think a BUS_DEVICE_RESET would get it unstuck, but
> the driver isn't trying that, and I haven't found any obvious way to force this
> to happen.  Here are the error messages I'm getting from the timeout:
> 
> Dec 13 03:04:54 gigan kernel: scsi : aborting command due to timeout : pid
> 221414, scsi0, channel 0, id 4, lun 0 Write (6) 01 00 00 40 00
> Dec 13 03:04:54 gigan kernel: (scsi0:0:4:0) Aborting scb 24, flags 0x4
> Dec 13 03:04:54 gigan kernel: (scsi0:0:4:0) SCB disconnected.  Queueing Abort
> SCB.
> Dec 13 03:04:55 gigan kernel: st0: Error 26030000 (sugg. bt 0x20, driver bt
                                           ^^^^^^^^
> 0x26, host bt 0x3).

That error indicates that we attempted to reach the device with a queued
ABORT message, and it never connected to the device because of a
SELECTION TIMEOUT.  If we are getting SELTO on the queued abort command,
then a queued BDR would do the same thing.  If we can't get the device
to respond to the arbitration/selection phases then we can't do anything
with it.  The only thing there that *might* work is a full bus reset. 
The bug in the driver so to speak is that when the queued abort went
through a SELTO, I should have picked up that this was a queued abort
and not the original command, in which case I should have just dropped
the command instead of completing it back to the mid level SCSI code (I
would still need to do some cleanup after the queued abort, but I
shouldn't let it count as the original command and get sent back to the
mid level SCSI code) so that it would time out again and result in a bus
reset.

> Of course I also need to figure out what the application is doing that causes
> the initial hangup, but there certainly should be some way to recover without
> rebooting.
> 
> Is there some way to force a BUS_DEVICE_RESET that I don't know about?  Any
> suggestions appreciated.

I'll fix that for a 5.1.7 driver.  FWIW, when a queued abort command
can't make it to the device, the aic7xx driver already recognizes that
fact and when the mid level SCSI code calls into the reset function for
a BDR we escalate the action to a full bus reset, I just missed the case
of SELTO since this usually happens when the bus is completely wedged,
not when it's operable.

-- 
  Doug Ledford   <dledford at redhat.com>
   Opinions expressed are my own, but
      they should be everybody's.

To Unsubscribe: send mail to majordomo at FreeBSD.org
with "unsubscribe aic7xxx" in the body of the message