hard to reproduce ... and to solve

Roberto Micarelli mi.ro at iol.it
Mon Aug 30 08:09:45 PDT 1999


Motherboard :

portwell robo578 - AMD K6-2 300Mhz - RAM 96 - onboard Adaptec AIC7880 -
onboard intel 82558 ethernet chip 10/100
SCSI BIOS ENABLED
20mb flash-card (ide) as boot/root disk (I've also tryied using a 'normal'
hard-disk but
nothing changes)


External Scsi Devices :

Redundant Raid Controller CMD CRD5641
4 seagate disks in a single raid-5 raid-set


OS :

linux 2.0.37 ipnat patched + integrated aic7xxx (version 5.1.13) and eepro100

Task:

nfsd exports a 4Gb ext2 scsi disk partition, servicing a
network client. A test program on the client is always writing/deleting a
recursive copy
The active scsi transfer rate is 20mbps

symptom :

after 4 hours of continuos data transfer , the system starts
displaying the following messages  on the console :

*date* localhost kernel: scsi: aborting command due to timeout: pid x, scsi0,
channel 0, id 0, lun 0 0x28 00 01 53 28 00 00 02 00
*date* localhost kernel: SCSI host 0 abort (pid xxx) timeout - resetting
*date* localhost kernel: SCSI bus is being reset for host 0 channel 0
*date* localhost kernel: scsi: aborting command due to timeout
....

The message repeats every few seconds and - after an arbitrary period of
instability - the kernel freezes.
No data transfer can be done in the meanwhile
aic7xxx options are default (5 sec. delay)

If I reset the motherboard (not the CMD) system is back to fine working for the
next 4 hours ...

I've tryied to seek on mailing lists for specific help but I found
only boot-time troubleshooting.

What we can't resolve is the hw or sw origin of the problem, scsi terminations
seem to be right as like as cable compliancy ...

Does anybody can help me ? :(((((((((

roberto


To Unsubscribe: send mail to majordomo at FreeBSD.org
with "unsubscribe aic7xxx" in the body of the message




More information about the aic7xxx mailing list