Faulty hard disk hanging machine?
Metod Kozelj
metod.kozelj at rzs-hm.si
Fri Jun 29 09:14:19 PDT 2001
Hello,
I have a presumably faulty hard disk, which can completely hang the
machine.
General data:
Alpha SX164
Linux 2.4.4 with aic7xxx 6.1.13
Adaptec 39160 and 2940UW
IBM UltraStar DDYS-T18350N and other SCSI devices (HDDs, CD-ROM)
If booted with default SCSI settings, the machine would freeze during
larger activity. The arangement (controller used, other devices on same
SCSI bus) didn't matter much.
If booted with option limiting queue depth to 64 the system behaved a
little better, but still not well at all (see log excerpt at the end of
mail).
I suspect that the HDD is defective as I have 5 HDDs of same type humming
happily in simmilar machine (hanging off 2940UW, but linux 2.2.19 and
aic7xxx 6.1.11) without any special setup (such as limited queue depth). I
just have to come up with some good explanation about why the HDD should
be replaced. It has been sent to repair centre allready and the answer was
that the disk was OK.
Peace!
Mkx
---- perl -e 'print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'
Jun 29 16:38:27 fractus kernel: SCSI subsystem driver Revision: 1.00
Jun 29 16:38:27 fractus kernel: scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.1.13
Jun 29 16:38:27 fractus kernel: <Adaptec 3960D Ultra160 SCSI adapter>
Jun 29 16:38:28 fractus kernel: aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/255 SCBs
Jun 29 16:38:28 fractus kernel:
Jun 29 16:38:28 fractus kernel: scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.1.13
Jun 29 16:38:28 fractus kernel: <Adaptec 3960D Ultra160 SCSI adapter>
Jun 29 16:38:28 fractus kernel: aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/255 SCBs
Jun 29 16:38:28 fractus kernel:
Jun 29 16:38:28 fractus kernel: scsi2 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.1.13
Jun 29 16:38:28 fractus kernel: <Adaptec 2940 Ultra SCSI adapter>
Jun 29 16:38:28 fractus kernel: aic7880: Ultra Wide Channel A, SCSI Id=7, 16/255 SCBs
Jun 29 16:38:28 fractus kernel:
Jun 29 16:38:28 fractus kernel: Vendor: SEAGATE Model: ST34572N Rev: 0784
Jun 29 16:38:29 fractus kernel: Type: Direct-Access ANSI SCSI revision: 02
Jun 29 16:38:29 fractus kernel: Vendor: NEC Model: CD-ROM DRIVE:465 Rev: 1.03
Jun 29 16:38:29 fractus kernel: Type: CD-ROM ANSI SCSI revision: 02
Jun 29 16:38:29 fractus kernel: scsi0:0:0:0: Tagged Queuing enabled. Depth 253
Jun 29 16:38:29 fractus kernel: Vendor: DEC Model: RZ28D (C) DEC Rev: 0008
Jun 29 16:38:29 fractus kernel: Type: Direct-Access ANSI SCSI revision: 02
Jun 29 16:38:29 fractus kernel: scsi1:0:1:0: Tagged Queuing enabled. Depth 32
Jun 29 16:38:29 fractus kernel: Vendor: IBM Model: DDYS-T18350N Rev: S93E
Jun 29 16:38:29 fractus kernel: Type: Direct-Access ANSI SCSI revision: 03
Jun 29 16:38:29 fractus kernel: scsi2:0:2:0: Tagged Queuing enabled. Depth 64
Jun 29 16:38:29 fractus kernel: Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
Jun 29 16:38:29 fractus kernel: Detected scsi disk sdb at scsi1, channel 0, id 1, lun 0
Jun 29 16:38:29 fractus kernel: Detected scsi disk sdc at scsi2, channel 0, id 2, lun 0
Jun 29 16:38:30 fractus kernel: (scsi0:A:0): 20.000MB/s transfers (20.000MHz, offset 15)
Jun 29 16:38:30 fractus kernel: SCSI device sda: 8888924 512-byte hdwr sectors (4551 MB)
Jun 29 16:38:30 fractus kernel: Partition check:
Jun 29 16:38:30 fractus kernel: sda: sda1 sda2 sda3 sda4 sda5
Jun 29 16:38:30 fractus kernel: (scsi1:A:1): 20.000MB/s transfers (10.000MHz, offset 15, 16bit)
Jun 29 16:38:30 fractus kernel: SCSI device sdb: 4110480 512-byte hdwr sectors (2105 MB)
Jun 29 16:38:30 fractus kernel: sdb: sdb1 sdb3 sdb8
Jun 29 16:38:30 fractus kernel: (scsi2:A:2): 40.000MB/s transfers (20.000MHz, offset 8, 16bit)
Jun 29 16:38:30 fractus kernel: SCSI device sdc: 35843670 512-byte hdwr sectors (18352 MB)
Jun 29 16:38:31 fractus kernel: sdc: sdc1 sdc3 sdc8
[ ... ]
Jun 29 17:04:41 fractus kernel: pci_map_sg failed: could not allocate dma page tables
Jun 29 17:04:42 fractus kernel: scsi2: PCI error Interrupt at seqaddr = 0x7d
Jun 29 17:04:42 fractus kernel: scsi2: Received a Target Abort
Jun 29 17:04:42 fractus kernel: scsi2: PCI error Interrupt at seqaddr = 0x7d
Jun 29 17:04:42 fractus kernel: scsi2: Received a Target Abort
Jun 29 17:05:12 fractus kernel: scsi2:0:2:0: Attempting to queue an ABORT message
Jun 29 17:05:12 fractus kernel: scsi2: PCI error Interrupt at seqaddr = 0x7d
Jun 29 17:05:12 fractus kernel: scsi2: Received a Target Abort
Jun 29 17:05:12 fractus kernel: scsi2:0:2:0: Cmd aborted from QINFIFO
Jun 29 17:05:12 fractus kernel: aic7xxx_abort returns 8194
Jun 29 17:05:22 fractus kernel: scsi2:0:2:0: Attempting to queue an ABORT message
Jun 29 17:05:22 fractus kernel: scsi2: PCI error Interrupt at seqaddr = 0x7e
Jun 29 17:05:22 fractus kernel: scsi2: Received a Target Abort
Jun 29 17:05:22 fractus kernel: scsi2:0:2:0: Cmd aborted from QINFIFO
Jun 29 17:05:22 fractus kernel: aic7xxx_abort returns 8194
Jun 29 17:05:22 fractus kernel: scsi2:0:2:0: Attempting to queue an ABORT message
Jun 29 17:05:22 fractus kernel: scsi2: PCI error Interrupt at seqaddr = 0x7d
Jun 29 17:05:22 fractus kernel: scsi2: Received a Target Abort
Jun 29 17:05:22 fractus kernel: (scsi2:A:2:0): Queuing a recovery SCB
Jun 29 17:05:22 fractus kernel: scsi2:0:2:0: Device is disconnected, re-queuing SCB
Jun 29 17:05:22 fractus kernel: Recovery code sleeping
Jun 29 17:05:27 fractus kernel: Recovery code awake
Jun 29 17:05:27 fractus kernel: Timer Expired
Jun 29 17:05:27 fractus kernel: aic7xxx_abort returns 8195
Jun 29 17:05:27 fractus kernel: scsi2:0:2:0: Attempting to queue an ABORT message
Jun 29 17:05:27 fractus kernel: scsi2: PCI error Interrupt at seqaddr = 0x7e
Jun 29 17:05:27 fractus kernel: scsi2: Received a Target Abort
Jun 29 17:05:27 fractus kernel: Recovery SCB completes
Jun 29 17:05:27 fractus kernel: (scsi2:A:2:0): Queuing a recovery SCB
Jun 29 17:05:27 fractus kernel: scsi2:0:2:0: Device is disconnected, re-queuing SCB
Jun 29 17:05:27 fractus kernel: Recovery code sleeping
Jun 29 17:05:27 fractus kernel: Recovery code awake
Jun 29 17:05:27 fractus kernel: aic7xxx_abort returns 8194
//
// Lots of simmilar message blocks as above deleted.
// These appeared in regular 10-seconds intervals.
// Each time the 'Commands Active' counter in /proc/scsi/aic7xxx/2
// decreased by 1.
//
Jun 29 17:15:37 fractus kernel: scsi2:0:2:0: Attempting to queue an ABORT message
Jun 29 17:15:37 fractus kernel: scsi2: PCI error Interrupt at seqaddr = 0x7e
Jun 29 17:15:37 fractus kernel: scsi2: Received a Target Abort
Jun 29 17:15:37 fractus kernel: scsi2:0:2:0: Cmd aborted from QINFIFO
Jun 29 17:15:37 fractus kernel: aic7xxx_abort returns 8194
Jun 29 17:15:37 fractus kernel: scsi2:0:2:0: Attempting to queue an ABORT message
Jun 29 17:15:37 fractus kernel: scsi2: PCI error Interrupt at seqaddr = 0x7d
Jun 29 17:15:37 fractus kernel: scsi2: Received a Target Abort
Jun 29 17:15:37 fractus kernel: scsi2:0:2:0: Device is active, asserting ATN
Jun 29 17:15:37 fractus kernel: Recovery code sleeping
Jun 29 17:15:42 fractus kernel: Recovery code awake
Jun 29 17:15:42 fractus kernel: Timer Expired
Jun 29 17:15:42 fractus kernel: aic7xxx_abort returns 8195
Jun 29 17:15:42 fractus kernel: scsi2:0:2:0: Attempting to queue a TARGET RESET message
Jun 29 17:15:42 fractus kernel: scsi2: PCI error Interrupt at seqaddr = 0x7e
Jun 29 17:15:42 fractus kernel: scsi2: Received a Target Abort
Jun 29 17:15:42 fractus kernel: scsi2:0:2:0: Device is active, asserting ATN
Jun 29 17:15:42 fractus kernel: Recovery code sleeping
Jun 29 17:15:47 fractus kernel: Recovery code awake
Jun 29 17:15:47 fractus kernel: Timer Expired
Jun 29 17:15:47 fractus kernel: aic7xxx_dev_reset returns 8195
Jun 29 17:15:47 fractus kernel: Recovery SCB completes
Jun 29 17:15:47 fractus kernel: Unable to handle kernel paging request at virtual address 003ffc0000206000
Jun 29 17:15:47 fractus kernel: scsi_eh_2(9): Oops 1
Jun 29 17:15:47 fractus kernel: pc = [iommu_arena_free+32/64] ra = [pci_unmap_sg+292/480] ps = 0007
Jun 29 17:15:47 fractus kernel: v0 = 0000000000000001 t0 = 0000000000000001 t1 = 003ffc0000206000
Jun 29 17:15:47 fractus kernel: t2 = 0000000000000000 t3 = 0000000000000000 t4 = 0000000000000002
Jun 29 17:15:47 fractus kernel: t5 = fffffc0000200000 t6 = fffffc00009a2690 t7 = fffffc000044c000
Jun 29 17:15:47 fractus kernel: s0 = 0000000000000001 s1 = 000000000001f000 s2 = fffffc0007f44018
Jun 29 17:15:47 fractus kernel: s3 = fffffc0000200080 s4 = 0000000000000000 s5 = ffffffffffffffff
Jun 29 17:15:47 fractus kernel: s6 = fffffc0007f44000
Jun 29 17:15:47 fractus kernel: a0 = fffffc0000200080 a1 = 0007fffffffffc00 a2 = 0000000000000010
Jun 29 17:15:47 fractus kernel: a3 = 0000000000000000 a4 = ffffffffffffffff a5 = 00000000000000ff
Jun 29 17:15:47 fractus kernel: t8 = 000000000000001f t9 = fffffc00009c5f08 t10= fffffc00009c7290
Jun 29 17:15:47 fractus kernel: t11= 0000000100000000 pv = fffffc000081ae40 at = fffffc00009c5880
Jun 29 17:15:47 fractus kernel: gp = fffffc00009c30b8 sp = fffffc000044fb30
Jun 29 17:15:47 fractus kernel: Code: 43f209a1 ALU zero,a2,t0
Jun 29 17:15:47 fractus kernel: 42220642 s8addq a1,t1,t1
Jun 29 17:15:47 fractus kernel: e4200008 blt t0,.+36
Jun 29 17:15:47 fractus kernel: 2fe00000 ldq_u zero,0(v0)
Jun 29 17:15:47 fractus kernel: 47ff041f or zero,zero,zero
Jun 29 17:15:47 fractus kernel: 2fe00000 ldq_u zero,0(v0)
Jun 29 17:15:47 fractus kernel: *b7e20000 stq zero,0(t1)
Jun 29 17:15:47 fractus kernel: 40603403 addq t2,1,t2
Jun 29 17:15:47 fractus kernel:
Jun 29 17:15:47 fractus kernel: Trace:8cd0fc 8d9878 8d9c80 8ce854 8c40ec 8c49bc 8c4fa8 810610 8c4e40
At this point, the counter 'Commands Active' in /proc/scsi/aic7xxx/2 reached
0. System was somehow alive, but would not properly shut down - it was
hanging at point where it should unmount file systems.
To Unsubscribe: send mail to majordomo at FreeBSD.org
with "unsubscribe aic7xxx" in the body of the message
More information about the aic7xxx
mailing list