HELP

Thu Feb 12 05:28:03 PST 1998

Richard Couture wrote:
> 
> I am running Linux kernel 2.0.33 with the aic7xxx patch 5.0.5/3.2.2
> with an Adaptec 2940U adapter.
> I was running an older version of the driver and had problems, so I
> got the newer driver to see if the problems would go away, but they
> did not.
> when I boot, dmesg reports the following:
> 
> [BEGIN DMESG RE:AIC]
>   (scsi0) <Adaptec AHA-294X Ultra SCSI host adapter> found at PCI 10/0
>   (scsi0) Narrow Channel, SCSI ID=7, 16/255 SCBs
>   (scsi0) BIOS enabled, IO Port 0xf800, IO Mem 0xffbef000, IRQ 11
>   (scsi0) Downloading sequencer code... 406 instructions downloaded
>   scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.0.5/3.2.2
>        <Adaptec AHA-294X Ultra SCSI host adapter>
>   scsi : 1 host.
>   (scsi0:0:-1:-1) Scanning channel for devices.
>   (scsi0:0:0:0) Synchronous at 10.0MHz, offset 15.
>     Vendor: CONNER    Model: CFP1060S 1.05GB   Rev: 2035
>     Type:   Direct-Access                      ANSI SCSI revision: 02
>   Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
>   (scsi0:0:1:0) Synchronous at 10.0MHz, offset 15.
>     Vendor: CONNER    Model: CFP4207S  4.28GB  Rev: 1524
>     Type:   Direct-Access                      ANSI SCSI revision: 02
>   Detected scsi disk sdb at scsi0, channel 0, id 1, lun 0
>   (scsi0:0:3:0) Synchronous at 6.67MHz, offset 15.
>     Vendor: ARCHIVE   Model: Python 28388-XXX  Rev: 5.45
>     Type:   Sequential-Access                  ANSI SCSI revision: 02
>   Detected scsi tape st0 at scsi0, channel 0, id 3, lun 0
>   scsi : detected 1 SCSI tape 2 SCSI disks total.
>   SCSI device sda: hdwr sector= 512 bytes. Sectors= 2074880 [1013 MB]
>   [1.0 GB]
>   SCSI device sdb: hdwr sector= 512 bytes. Sectors= 8388608 [4096 MB]
>   [4.1 GB]
> [END DMESG RE:AIC]

This all looks right and good.

> 
> When I back up the system I get the following errors:
> 
> [BEGIN ERROR REPORT]
>   st0: Error with sense data: extra data not valid Current error
>   st09:00: sense key Aborted Command
>   Additional sense indicates Operator medium removal request

Were there no messages prior to this one?  If not, make sure your kernel is
compiled with verbose SCSI error reporting so we can tell if the SCSI mid
level code sent an abort request.

>   st0: Error with sense data: extra data not valid Current error
>   st09:00: sense key Not Ready
>   Additional sense indicates Medium not present

This makes sense, it seems like there was a command earlier that ejected the
tape and then another command trying to read the tape that isn't any longer
there.

>   st0: Error with sense data: extra data not valid Current error
>   st09:00: sense key Aborted Command
>   Additional sense indicates Operator medium removal request

Same as the first error.  Again, it would be best to know if there was an
actual abort command sent by the mid level code or if the Aborted Command is
referring to something else (which I suspect it is, such as an eject with no
tape in the drive).

>   (scsi0:0:0:0) Data overrun detected in Data-In phase, tag 14;
>     Have seen Data Phase. Length=28672, NumSGs=5.
>        sg[0] - Addr 0xb32000 : Length 4096
>        sg[1] - Addr 0xb37000 : Length 12288
>        sg[2] - Addr 0xb3b000 : Length 4096
>        sg[3] - Addr 0xb3e000 : Length 4096
>        sg[4] - Addr 0xb40000 : Length 4096

I see these occasionally from certain drives, usually under heavy load. 
Normally, they are nothign to worry about.

>   scsi : aborting command due to timeout : pid 313876, scsi0, channel 0,
>   id 0, lun 0 Read (6) 0d f0 70 5a 00
>   scsi : aborting command due to timeout : pid 313874, scsi0, channel 0,
>   id 0, lun 0 Read (6) 0d f0 18 38 00
>   scsi : aborting command due to timeout : pid 313878, scsi0, channel 0,
>   id 0, lun 0 Read (6) 16 84 e6 02 00

This usually indicates a drive that is either wedged itself or has wedged
the bus.

>   (scsi0:0:0:0) No active SCB for reconnecting target - Issuing BUS
>   DEVICE RESET.

I've seen one other of these since the 5.0.5 release, and this indicates an
error in the driver somewhere.  I'm currently hunting for it.  However, the
hunt would be much easier if I had more information :)  For instance, it
looks like the system was booted without the aic7xxx=verbose option because
I'm not seeing any calls to aic7xxx_abort due to the above lines but those
calls should exist.

>   (scsi0:0:0:0)       SAVED_TCL=0x0, ARG_1=0xe, SEQADDR=0x100
>   (scsi0:0:0:0) Synchronous at 10.0MHz, offset 15.
>   st0: Error with sense data: Current error st09:00: sense key Medium
>   Error
>   Additional sense indicates Sequential positioning error

Either a bad tape, or the equivelant of an "mt -f /dev/st0 fsf x" command
where x is too high.  In other words, the software has tried to space
forward past x filemarks and there weren't that many file marks on the tape.

>   st0: Error with sense data: Current error st09:00: sense key Medium
>   Error
>   Additional sense indicates Sequential positioning error
>   st0: Error with sense data: Current error st09:00: sense key Medium
>   Error
>   Additional sense indicates Sequential positioning error
>   st0: Error with sense data: Current error st09:00: sense key Medium
>   Error
>   Additional sense indicates Sequential positioning error
> [END ERROR REPORT]
> 
> Here is what /proc/scsi has to say:
> 
> [BEGIN /proc/scsi]
>   Attached devices:
>   Host: scsi0 Channel: 00 Id: 00 Lun: 00
>     Vendor: CONNER   Model: CFP1060S 1.05GB  Rev: 2035
>     Type:   Direct-Access                    ANSI SCSI revision: 02
>   Host: scsi0 Channel: 00 Id: 01 Lun: 00
>     Vendor: CONNER   Model: CFP4207S  4.28GB Rev: 1524
>     Type:   Direct-Access                    ANSI SCSI revision: 02
>   Host: scsi0 Channel: 00 Id: 03 Lun: 00
>     Vendor: ARCHIVE  Model: Python 28388-XXX Rev: 5.45
>     Type:   Sequential-Access                ANSI SCSI revision: 02
> [END /proc/scsi]
> 
> here is what /proc/scsi/0 has to say:
> 
> [BEGIN /proc/scsi/0]
>   Adaptec AIC7xxx driver version: 5.0.5/3.2.2
>   Compile Options:
>     AIC7XXX_RESET_DELAY    : 15
>     AIC7XXX_TAGGED_QUEUEING: Enabled (This is no longer an option)
>       See AIC7XXX_TAGGED_QUEUEING_BY_DEVICE in the file
>       drivers/scsi/aic7xxx.c to disable tagged queueing on
>       problematic devices.
>     AIC7XXX_PAGE_ENABLE    : Enabled (This is no longer an option)
>     AIC7XXX_PROC_STATS     : Disabled
> 
>   Adapter Configuration:
>              SCSI Adapter: Adaptec AHA-294X Ultra SCSI host adapter
>                              Narrow Controller
>                   Base IO: 0xf800
>            Base IO Memory: 0xffbef000
>       BIOS Memory Address: 0x0
>                            Enabled
>                       IRQ: 11
>                      SCBs: Active 0, Max Active 17,
>                            Allocated 30, HW 16, Page 255
>                Interrupts: 348417
>      Extended Translation: Disabled
>            SCSI Bus Reset: Enabled
>   Disconnect Enable Flags: 0x00ff
>    Tag Queue Enable Flags: 0x0003
>   Ordered Queue Tag Flags: 0x0003
>         BIOS Control Word: 0x1034
>      Adapter Control Word: 0x0018
> [END /proc/scsi/0]
> 
> In looking at the code in aic7xxx.c, I suspect that maybe I should
> uncomment:
> 
> /* #define AIC7XXX_TAGGED_QUEUEING_BY_DEVICE */
> then change:
> 
> adapter_tag_info_t aic7xxx_tag_info[] =
> {
>   {DEFAULT_TAG_COMMANDS},
>   {{4, 0, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, -1, 4, 4, 4}},
>   {DEFAULT_TAG_COMMANDS},
>   {{-1, 16, 4, -1, 16, 4, 4, 4, 127, 4, 4, 4, 4, 4, 4, 4}}
> };
> to something like:
> 
> adapter_tag_info_t aic7xxx_tag_info[] =
> {
>   {{-1, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}},
>   {{4, 0, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, -1, 4, 4, 4}},
>   {DEFAULT_TAG_COMMANDS},
>   {{-1, 16, 4, -1, 16, 4, 4, 4, 127, 4, 4, 4, 4, 4, 4, 4}}
> };
> 
> but I hesitate to take such steps without knowing if I am on the right
> track, and knowing what the ramifications might be, and knowing if
> this a known problem with this particular configuration.

This likely won't help.  You might reduce the OVERRIDE_CMDS_PER_LUN value in
the make {menu,x}config screen, but that should be sufficient.  The Conner
HD doesn't appear to have problems with tagged queueing in general, so
completely disabling it isn't called for.  On the other hand, some devices
do have problems with high queue depths.  Usually the driver will accomodate
these devices by reducing the queue depth just on those devices, but to know
if this has happened, you'll have to boot with the aic7xxx=verbose option.

> I am willing to change hardware if that might be a more elegant
> solution, but I am not clear, from the info that I see, as to which
> device is really causing the problems.
> Thank you for your patience in reading through all of this stuff, and
> thank you in advance for your help and suggestions.

Well, the only hardware I would try changing at the moment is your tape
drive.  Beyond that, just try to replicate these errors after rebuilding
your kernel with verbose SCSI reporting (if it isn't already enabled) and
after rebooting with aic7xxx=verbose (or aic7xxx=verbose:0xffff for even
more info) and then send me the full logs of what happened.  From there, I
should be able to figure it out.

-- 

 Doug Ledford  <dledford at dialnet.net>
  Opinions expressed are my own, but
     they should be everybody's.

To Unsubscribe: send mail to majordomo at FreeBSD.org
with "unsubscribe aic7xxx" in the body of the message