Faulty hard disk hanging machine?

Metod Kozelj metod.kozelj at rzs-hm.si
Fri Jun 29 09:14:19 PDT 2001


Hello,

I have a presumably faulty hard disk, which can completely hang the
machine.

General data:
Alpha SX164
Linux 2.4.4 with aic7xxx 6.1.13
Adaptec 39160 and 2940UW
IBM UltraStar DDYS-T18350N and other SCSI devices (HDDs, CD-ROM)

If booted with default SCSI settings, the machine would freeze during
larger activity. The arangement (controller used, other devices on same
SCSI bus) didn't matter much.

If booted with option limiting queue depth to 64 the system behaved a
little better, but still not well at all (see log excerpt at the end of
mail).

I suspect that the HDD is defective as I have 5 HDDs of same type humming
happily in simmilar machine (hanging off 2940UW, but linux 2.2.19 and
aic7xxx 6.1.11) without any special setup (such as limited queue depth). I
just have to come up with some good explanation about why the HDD should
be replaced. It has been sent to repair centre allready and the answer was
that the disk was OK.

Peace!
  Mkx

---- perl -e 'print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'


Jun 29 16:38:27 fractus kernel: SCSI subsystem driver Revision: 1.00 
Jun 29 16:38:27 fractus kernel: scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.1.13 
Jun 29 16:38:27 fractus kernel:         <Adaptec 3960D Ultra160 SCSI adapter> 
Jun 29 16:38:28 fractus kernel:         aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/255 SCBs 
Jun 29 16:38:28 fractus kernel:  
Jun 29 16:38:28 fractus kernel: scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.1.13 
Jun 29 16:38:28 fractus kernel:         <Adaptec 3960D Ultra160 SCSI adapter> 
Jun 29 16:38:28 fractus kernel:         aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/255 SCBs 
Jun 29 16:38:28 fractus kernel:  
Jun 29 16:38:28 fractus kernel: scsi2 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.1.13 
Jun 29 16:38:28 fractus kernel:         <Adaptec 2940 Ultra SCSI adapter> 
Jun 29 16:38:28 fractus kernel:         aic7880: Ultra Wide Channel A, SCSI Id=7, 16/255 SCBs 
Jun 29 16:38:28 fractus kernel:  
Jun 29 16:38:28 fractus kernel:   Vendor: SEAGATE   Model: ST34572N          Rev: 0784 
Jun 29 16:38:29 fractus kernel:   Type:   Direct-Access                      ANSI SCSI revision: 02 
Jun 29 16:38:29 fractus kernel:   Vendor: NEC       Model: CD-ROM DRIVE:465  Rev: 1.03 
Jun 29 16:38:29 fractus kernel:   Type:   CD-ROM                             ANSI SCSI revision: 02 
Jun 29 16:38:29 fractus kernel: scsi0:0:0:0: Tagged Queuing enabled.  Depth 253 
Jun 29 16:38:29 fractus kernel:   Vendor: DEC       Model: RZ28D    (C) DEC  Rev: 0008 
Jun 29 16:38:29 fractus kernel:   Type:   Direct-Access                      ANSI SCSI revision: 02 
Jun 29 16:38:29 fractus kernel: scsi1:0:1:0: Tagged Queuing enabled.  Depth 32 
Jun 29 16:38:29 fractus kernel:   Vendor: IBM       Model: DDYS-T18350N      Rev: S93E 
Jun 29 16:38:29 fractus kernel:   Type:   Direct-Access                      ANSI SCSI revision: 03 
Jun 29 16:38:29 fractus kernel: scsi2:0:2:0: Tagged Queuing enabled.  Depth 64 
Jun 29 16:38:29 fractus kernel: Detected scsi disk sda at scsi0, channel 0, id 0, lun 0 
Jun 29 16:38:29 fractus kernel: Detected scsi disk sdb at scsi1, channel 0, id 1, lun 0 
Jun 29 16:38:29 fractus kernel: Detected scsi disk sdc at scsi2, channel 0, id 2, lun 0 
Jun 29 16:38:30 fractus kernel: (scsi0:A:0): 20.000MB/s transfers (20.000MHz, offset 15) 
Jun 29 16:38:30 fractus kernel: SCSI device sda: 8888924 512-byte hdwr sectors (4551 MB) 
Jun 29 16:38:30 fractus kernel: Partition check: 
Jun 29 16:38:30 fractus kernel:  sda: sda1 sda2 sda3 sda4 sda5 
Jun 29 16:38:30 fractus kernel: (scsi1:A:1): 20.000MB/s transfers (10.000MHz, offset 15, 16bit) 
Jun 29 16:38:30 fractus kernel: SCSI device sdb: 4110480 512-byte hdwr sectors (2105 MB) 
Jun 29 16:38:30 fractus kernel:  sdb: sdb1 sdb3 sdb8 
Jun 29 16:38:30 fractus kernel: (scsi2:A:2): 40.000MB/s transfers (20.000MHz, offset 8, 16bit) 
Jun 29 16:38:30 fractus kernel: SCSI device sdc: 35843670 512-byte hdwr sectors (18352 MB) 
Jun 29 16:38:31 fractus kernel:  sdc: sdc1 sdc3 sdc8 

[ ... ]

Jun 29 17:04:41 fractus kernel: pci_map_sg failed: could not allocate dma page tables 
Jun 29 17:04:42 fractus kernel: scsi2: PCI error Interrupt at seqaddr = 0x7d 
Jun 29 17:04:42 fractus kernel: scsi2: Received a Target Abort 
Jun 29 17:04:42 fractus kernel: scsi2: PCI error Interrupt at seqaddr = 0x7d 
Jun 29 17:04:42 fractus kernel: scsi2: Received a Target Abort 
Jun 29 17:05:12 fractus kernel: scsi2:0:2:0: Attempting to queue an ABORT message 
Jun 29 17:05:12 fractus kernel: scsi2: PCI error Interrupt at seqaddr = 0x7d 
Jun 29 17:05:12 fractus kernel: scsi2: Received a Target Abort 
Jun 29 17:05:12 fractus kernel: scsi2:0:2:0: Cmd aborted from QINFIFO 
Jun 29 17:05:12 fractus kernel: aic7xxx_abort returns 8194 
Jun 29 17:05:22 fractus kernel: scsi2:0:2:0: Attempting to queue an ABORT message 
Jun 29 17:05:22 fractus kernel: scsi2: PCI error Interrupt at seqaddr = 0x7e 
Jun 29 17:05:22 fractus kernel: scsi2: Received a Target Abort 
Jun 29 17:05:22 fractus kernel: scsi2:0:2:0: Cmd aborted from QINFIFO 
Jun 29 17:05:22 fractus kernel: aic7xxx_abort returns 8194 
Jun 29 17:05:22 fractus kernel: scsi2:0:2:0: Attempting to queue an ABORT message 
Jun 29 17:05:22 fractus kernel: scsi2: PCI error Interrupt at seqaddr = 0x7d 
Jun 29 17:05:22 fractus kernel: scsi2: Received a Target Abort 
Jun 29 17:05:22 fractus kernel: (scsi2:A:2:0): Queuing a recovery SCB 
Jun 29 17:05:22 fractus kernel: scsi2:0:2:0: Device is disconnected, re-queuing SCB 
Jun 29 17:05:22 fractus kernel: Recovery code sleeping 
Jun 29 17:05:27 fractus kernel: Recovery code awake 
Jun 29 17:05:27 fractus kernel: Timer Expired 
Jun 29 17:05:27 fractus kernel: aic7xxx_abort returns 8195 
Jun 29 17:05:27 fractus kernel: scsi2:0:2:0: Attempting to queue an ABORT message 
Jun 29 17:05:27 fractus kernel: scsi2: PCI error Interrupt at seqaddr = 0x7e 
Jun 29 17:05:27 fractus kernel: scsi2: Received a Target Abort 
Jun 29 17:05:27 fractus kernel: Recovery SCB completes 
Jun 29 17:05:27 fractus kernel: (scsi2:A:2:0): Queuing a recovery SCB 
Jun 29 17:05:27 fractus kernel: scsi2:0:2:0: Device is disconnected, re-queuing SCB 
Jun 29 17:05:27 fractus kernel: Recovery code sleeping 
Jun 29 17:05:27 fractus kernel: Recovery code awake 
Jun 29 17:05:27 fractus kernel: aic7xxx_abort returns 8194 

//
// Lots of simmilar message blocks as above deleted.
// These appeared in regular 10-seconds intervals.
// Each time the 'Commands Active' counter in /proc/scsi/aic7xxx/2
// decreased by 1.
//

Jun 29 17:15:37 fractus kernel: scsi2:0:2:0: Attempting to queue an ABORT message 
Jun 29 17:15:37 fractus kernel: scsi2: PCI error Interrupt at seqaddr = 0x7e 
Jun 29 17:15:37 fractus kernel: scsi2: Received a Target Abort 
Jun 29 17:15:37 fractus kernel: scsi2:0:2:0: Cmd aborted from QINFIFO 
Jun 29 17:15:37 fractus kernel: aic7xxx_abort returns 8194 
Jun 29 17:15:37 fractus kernel: scsi2:0:2:0: Attempting to queue an ABORT message 
Jun 29 17:15:37 fractus kernel: scsi2: PCI error Interrupt at seqaddr = 0x7d 
Jun 29 17:15:37 fractus kernel: scsi2: Received a Target Abort 
Jun 29 17:15:37 fractus kernel: scsi2:0:2:0: Device is active, asserting ATN 
Jun 29 17:15:37 fractus kernel: Recovery code sleeping 
Jun 29 17:15:42 fractus kernel: Recovery code awake 
Jun 29 17:15:42 fractus kernel: Timer Expired 
Jun 29 17:15:42 fractus kernel: aic7xxx_abort returns 8195 
Jun 29 17:15:42 fractus kernel: scsi2:0:2:0: Attempting to queue a TARGET RESET message 
Jun 29 17:15:42 fractus kernel: scsi2: PCI error Interrupt at seqaddr = 0x7e 
Jun 29 17:15:42 fractus kernel: scsi2: Received a Target Abort 
Jun 29 17:15:42 fractus kernel: scsi2:0:2:0: Device is active, asserting ATN 
Jun 29 17:15:42 fractus kernel: Recovery code sleeping 
Jun 29 17:15:47 fractus kernel: Recovery code awake 
Jun 29 17:15:47 fractus kernel: Timer Expired 
Jun 29 17:15:47 fractus kernel: aic7xxx_dev_reset returns 8195 
Jun 29 17:15:47 fractus kernel: Recovery SCB completes 
Jun 29 17:15:47 fractus kernel: Unable to handle kernel paging request at virtual address 003ffc0000206000 
Jun 29 17:15:47 fractus kernel: scsi_eh_2(9): Oops 1 
Jun 29 17:15:47 fractus kernel: pc = [iommu_arena_free+32/64]  ra = [pci_unmap_sg+292/480]  ps = 0007 
Jun 29 17:15:47 fractus kernel: v0 = 0000000000000001  t0 = 0000000000000001  t1 = 003ffc0000206000 
Jun 29 17:15:47 fractus kernel: t2 = 0000000000000000  t3 = 0000000000000000  t4 = 0000000000000002 
Jun 29 17:15:47 fractus kernel: t5 = fffffc0000200000  t6 = fffffc00009a2690  t7 = fffffc000044c000 
Jun 29 17:15:47 fractus kernel: s0 = 0000000000000001  s1 = 000000000001f000  s2 = fffffc0007f44018 
Jun 29 17:15:47 fractus kernel: s3 = fffffc0000200080  s4 = 0000000000000000  s5 = ffffffffffffffff 
Jun 29 17:15:47 fractus kernel: s6 = fffffc0007f44000 
Jun 29 17:15:47 fractus kernel: a0 = fffffc0000200080  a1 = 0007fffffffffc00  a2 = 0000000000000010 
Jun 29 17:15:47 fractus kernel: a3 = 0000000000000000  a4 = ffffffffffffffff  a5 = 00000000000000ff 
Jun 29 17:15:47 fractus kernel: t8 = 000000000000001f  t9 = fffffc00009c5f08  t10= fffffc00009c7290 
Jun 29 17:15:47 fractus kernel: t11= 0000000100000000  pv = fffffc000081ae40  at = fffffc00009c5880 
Jun 29 17:15:47 fractus kernel: gp = fffffc00009c30b8  sp = fffffc000044fb30 
Jun 29 17:15:47 fractus kernel: Code: 43f209a1  ALU zero,a2,t0 
Jun 29 17:15:47 fractus kernel:  42220642  s8addq a1,t1,t1 
Jun 29 17:15:47 fractus kernel:  e4200008  blt t0,.+36 
Jun 29 17:15:47 fractus kernel:  2fe00000  ldq_u zero,0(v0) 
Jun 29 17:15:47 fractus kernel:  47ff041f  or zero,zero,zero 
Jun 29 17:15:47 fractus kernel:  2fe00000  ldq_u zero,0(v0) 
Jun 29 17:15:47 fractus kernel: *b7e20000  stq zero,0(t1) 
Jun 29 17:15:47 fractus kernel:  40603403  addq t2,1,t2 
Jun 29 17:15:47 fractus kernel:  
Jun 29 17:15:47 fractus kernel: Trace:8cd0fc 8d9878 8d9c80 8ce854 8c40ec 8c49bc 8c4fa8 810610 8c4e40  


At this point, the counter 'Commands Active' in /proc/scsi/aic7xxx/2 reached
0. System was somehow alive, but would not properly shut down - it was
hanging at point where it should unmount file systems.


To Unsubscribe: send mail to majordomo at FreeBSD.org
with "unsubscribe aic7xxx" in the body of the message




More information about the aic7xxx mailing list