Async negotiation (Was: 2.0.36/5.1.4 and BIOS settings)

Doug Ledford dledford at redhat.com
Fri Dec 11 22:21:21 PST 1998


Dirk Lutzebaeck wrote:
> 
> Doug Ledford writes:
>  > drive in question.  If that FAQ says to always use async mode regardless
>  > of the make or model of drive then they need to bite me.  There are lots
>  > of DAT drives out there that work great at sync speeds.  HP and SONY tend
>  > to run forever sync.  If Seagate DATs have a problem in specific then
>  > that's another matter, but last I knew, Seagate only says to disable sync
>  > negotiation on the model ST8000N drives or something like that.  It's a
> 
> Doug, I was wrong to say "all drives". This is from "System Lockup or
> Freezing problems when running the tape backup" of the Seagate site: (1)
> 
>   "Synchronous Negotiation should be disabled for all Python
>    tape drives. Leave Synchronous Negotiation enabled for the
>    Peregrine or Scorpion tape drives."
> 
> What confuses me is that my Scorpion STD28000 has a dip switch
> to make the ID-string say it is an ARCHIVE (ie. Python) tape. So it
> would then need to run in async mode?
> 
> (1) http://www.seagate.com/support/tape/scsiide/sublinks/s4m_lock.shtml
> 
> The real problem I have is this during a backup session: (2.0.36, ASUS
> P2B-DS 7890 on board)
> 
> Nov 18 13:56:24 kamet kernel: scsi : aborting command due to timeout : pid 186495, scsi0,
                                                                            
^^^^^^
> channel 0, id 0, lun 0 Read (10) 00 00 76 76 95 00 00 98 00
> Nov 18 13:56:25 kamet kernel: scsi : aborting command due to timeout : pid 186496, scsi0, channel 0, id 0, lun 0 Write (6) 00 80 65 02 00
> 
> [... this goes on and on until ...]
> 
> Nov 18 14:11:04 kamet kernel: scsi : aborting command due to timeout : pid 186495, scsi0, channel 0, id 0, lun 0 Read (10) 00 00 76 76 95 00 00 98 00
> Nov 18 14:11:06 kamet kernel: SCSI host 0 abort (pid 186494) timed out - resetting
                                                       ^^^^^^
> Nov 18 14:11:06 kamet kernel: SCSI bus is being reset for host 0 channel 0.
> Nov 18 14:11:06 kamet kernel: (scsi0:0:0:0) Synchronous at 20.0 Mbyte/sec, offset 15.
> Nov 18 14:11:24 kamet kernel: st0: Error 26030000.
> Nov 18 14:11:26 kamet kernel: st0: Error 26030000.
> 
> This is my IBM DDRS-UW on id 0 gets a timeout and after 5min (!) the
> kernel seems to issue a bus reset. The strange thing is that this always
> happens with amanda but never with just a simple tar on st0 (of
> several 100MBs).

OK..here's what I think is happenning.  The aborts that you are seeing
are all for the hard disk up to the very last one.  They all succeeded. 
The very last one for pid 186494 is for the tape device.  IT's actually
been wedging the SCSI bus for 15 minutes now, but because at the time
the bus got wedged there were no outstanding commands to the hard drive,
the repeated aborts of hard drive commands have been finding those
commands in the qinfifo before they ever got sent to the drive and
therefore we have spent the last 15 minutes going through a cycle of
timeout disk command, succusfully abort disk command *without* using a
bus reset (that's the important part), resend command, timeout, ad.
nauseum.  The tape command has the normal tape timeout of 15 minutes. 
That's why we spend 15 minutes (900 seconds) doing this little loop
dance until the tape times out, then we reset the bus, the drive goes on
and works as normal, and the Amanda backup is hosed.

Now, having said that, the Amanda backup program appears to cause
several problems, and this one especially appears to be an Amanda
related problem if tars work fine.  Not only that, but the st0 error
code after the reset indicates DID_TIME_OUT so the tape drive has
essentially gone off-line and isn't responding to our attempts at
selection any more.  That's almost a full 20 seconds after the reset the
tape drive still won't select.  I would think this might very well be an
Amanda issue.  I would make sure of a few things.  One, make sure
disconnection is enabled on all devices in the Adaptec BIOS setup.  Two,
make sure that tagged queueing is enabled on your hard drive (check the
README.aic7xxx file).  Three, grab my 5.1.6 driver patch (for shits and
grins, just in case there's something wrong with the version you are
using now).  After you do that, see what happens.  If Amanda still
pukes, then I would start talking to the Amanda authors to see what
might be up.  Find out if they do anything special with sg devices, or
special scsi_ioctl calls to pass through commands the device might not
like, that sort of thing.

-- 
  Doug Ledford   <dledford at redhat.com>
   Opinions expressed are my own, but
      they should be everybody's.

To Unsubscribe: send mail to majordomo at FreeBSD.org
with "unsubscribe aic7xxx" in the body of the message



More information about the aic7xxx mailing list