Process of elimination

Taras M. Dowhaluk tarasd at visiondb.com.au
Mon Feb 2 15:36:08 PST 1998


G'day to the list,

I need to determine whether what I am seeing is a true hardware problem, ie. 
medium errors, or a an aic7xxx problem, or something completely different. Any 
assistance in this would be greatly appreciated.

I administer a 2.2.2-FreeBSD machine with the following SCSI config:

AHA2940UW (id 7)
    |-- Internal Quantum ST6.4S (id 1) ("1")
          |-- Internal Quantum ST6.4S (id 2) ("0")
               |-- Internal Iomega Jaz (id 6).

For the sake of discussion SCSI ID 1 is physically labelled "1" and SCSI ID is 
labelled "0". (Historical).

The events have been as follows:
1. 17Sep1997, drive "0" was the "system" drive, drive "1" was the "data" drive.

2. "0" had medium errors and rapidly deteriorated until the drive was unusable.

3. AHA2940UW low-level verify on "0" reported errors. Replaced "0" with a new 
drive, ran a low-level format, then a verify and all OK, rebuilt system, all 
was OK.

4. 18Nov1997, "0" reported several medium errors again, 1 day later drive was 
saturated with medium errors.

5. Placed an active terminator on end of chain.

6. low-level verify on "0" reported errors, ran a low-level format, then a 
verify and all OK.

7. Made "1" the system disk, and "0" the data disk, swapped SCSI ids, swapped 
postion of drives in the bay such that it is as per the above diagram.

8. 21Jan1998, "0" fails again. Day 1 of failure several MEDIUM errors 
(info?:661041 asc:11,0 Unrecoverable read errors). Next day disk is saturated 
with these errors.

9. /sbin/scsiformat doesn't remove these errors, only a low-level scsi format.


There have been 2 disks which failed in a similar manner, with the second 
failing twice.


My questions are these:
1. This doesn't seem to be physical medium failure, I would have expected 
the errors to be hard-on. Hence is this where the soft sector format of the 
disk is being corrupted ?

2. If it is a format corruption, what is doing it ? I am correct in my 
assumption that its probably NOT the OS since "0" has been in 2 different 
positions and uses and "1" is running quite fine.

3. Just to "tick the box", should I upgrade to 2.2.5 (I believe thats the 
latest) ? Where would be the latest, safest most stable aic7xxx driver ?

4. I can't go and swap all the possible culprits, ie. aha2940uw, 
power supply, cables, drives mainly because of the time it takes for the 
failures to happen, ie. regularly every 2 months. Has anyone seen anything 
similar ?

5. Besides electrically, how else can I tell if the terminator I have is active 
or passive ? There are no markings on it.

Any advice appreciated.


regards, taras

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Taras M. Dowhaluk
Director - Technical Operations
VisionDB Pty Ltd
Sydney, Australia
email: tarasd at visiondb.com.au
www:   http://www.visiondb.com.au
www:   http://www.biz.com.au
voice: +61 2 9922 6615
fax:   +61 2 9907 1078
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



More information about the aic7xxx mailing list