Kernel-SCSI crash. (Serious bug since 5.0.11?)

Stephan Loescher loescher at leo.org
Fri Jul 9 09:19:22 PDT 1999



Hi!

I have found a bug, that appeares first in the aic7xxx-Code in Linux 
2.0.34 (5.0.14/3.2.4) and is there up to recent 2.3.xx-kernels! The
aic7xxx-Code in Linux 2.0.33 (4.1.1/3.2.1) runs stable for me.

The sympoms:
When I copy a lot of large files from my harddisk (IBM DCAS-34330W) to
my magneto-optical (MO) drive, then after some time (5 seconds to
several minutes) the Linux kernel stops. The system is freezed (locked
up) and the SCSI-bus led, the MO-led and the harddisk-led is lighting. I
can´t log into my system. Mouse and keyboard are "dead".
The source-and target-filesystems are ext2.
I can reproduce this behaviour.
I can copy files between all my harddisks without any error.
With kernel 2.0.33 there are no problems!

I nailed it down with linux/Documentation/BUG-HUNTING to the
aic7xxx-Code, because when I replace the aic7xxx-files in 2.0.34 with
the files from 2.0.33, then the system runs stable.

I have tried the following kernels:
2.0.34
2.0.35
2.0.36
2.1.128
2.2.2
2.2.5
2.2.6
2.2.7
2.2.10
2.3.4
(with and without all AC-patches)

Also disabling all aic7xxx-features does not help.
I tried these options:
aic7xxx=verbose, aic7xxx=pci_parity, aic7xxx=verbose:0x1ffff
and disabled TAGGED_QUEUEING at all.

To help you finding the bug, I tried all aic7xxx-patches for Linux
2.0.33 from the last 4.x.x up to 5.0.13. The results are:

5.0.0 /3.2.2: OK
5.0.1 /3.2.2: does not boot, seems _very_ unstable
5.0.10/3.2.2: OK
5.0.11/3.2.2: Makes endless SCSI-resets after issuing commands like
              echo "scsi remove-single-device 0 0 1 0 " >/proc/scsi/scsi
5.0.12/3.2.2: locks up the system as 5.0.14 does!
5.0.13/3.2.2: locks up the system as 5.0.14 does!

My system:
Pentium-200 (single-CPU)
SCSI-HA: Adaptec 3490U, Bios 1.24
Channel A:
0 : CD Sony CDU-76S
1 : HD Seagate ST32430N
3 : CDRW Yamaha CRW4416S 1.0f
4 : Streamer Tandberg NS20 Pro 
5 : HD IBM DCAS-34330
6 : HD IBM DCAS-34330W
(End of SCSI-bus with active termination, and AHA with auto-termination.)
Channel B:
0 : Olympus Deltis-MOS320 (MO)
3 : HP ScanJet
(End of SCSI-bus with passive termination, and AHA with auto-termination.)

What was changed in the aic7xxx-code after 5.0.10/3.2.2?

What can I do to help you finding the bug?

Stephan.

-- 
loescher at leo.org
http://www.leo.org/~loescher/


To Unsubscribe: send mail to majordomo at FreeBSD.org
with "unsubscribe aic7xxx" in the body of the message




More information about the aic7xxx mailing list