aic7xxx problems

Kristian Vlahovicek kristian at icgeb.org
Tue Nov 18 09:18:05 PST 2003


Hi Justin, thank you for your prompt answer and for pointing us in right
directions into solving the problem!


>> We did the OS upgrade, and installed the 2.4.20-8smp kernel (RH9
>> vanilla), the machine boots with aic7xxx dumping the whole lot of
>> messages (see below).
>
> Was this supposed to be in your attachment?  The attachment is empty.

hmm, try 2. hope you get it this time.


>> An attempt to upgrade kernel to newer version (still the RH9 updated
>> 2.4.20-20.9smp) results in boot process stopping just before the SCSI
>> loading. We tried to update teh aic7xxx driver to v6.3.0 from Justin's
>> website and that one again resulted in complete unability to boot
>> (same stop point as above).
>
> Did you do this via RPM or by building the driver from source?
> 6.3.3 is the latest RPM version on my website.

RPM. noticed the 6.3.3 today, at the time we tried it, 6.3.0 was the most
recent


>> Note that both kernels 2.4.20-20.9 and the
>> 2.4.20-8 boot in single processor version with both the original
>> aic7xxx drivers and the v6.3.0 DO boot.
>
> I will have to see the messages, but you may be experiencing interrupt
> routing problems with the newer kernels.  Playing with APIC and ACPI
> settings may allow an SMP kernel to boot correctly.

Thanks! Do you have any suggestions on how to do it? An interesting
(maybe) thing is that aic7xxx does not complain at all and does not dump
anything when a non-smp kernel boots. could this be the APIC issue as
well?


>
>> What we see currently are RAID messages about kicking disks from array
>> due to IO erors (they look like hardware errors even though surface
>> tests do not give anything!):
>
> Unfortunately, the BIOS scan is rarely conclusive.  Media errors may
> only present themselves when the disk is at a higher temperature.  The
> single sector reads performed by the BIOS are not a sufficiently high
> load to elevate drive temperature to something similar to that of an
> active server.

Do you have any suggestions on how to extensively test our disks? What
would be the most appropriate tool?

>
>> -----------
>> Nov 18 11:50:33 hydra kernel: SCSI disk error : host 0 channel 0 id 1
>> lun 0 return code = 8000002 Nov 18 11:50:33 hydra kernel: Info
>> fld=0x4005, Current sd08:11: sense key Hardware Error Nov 18 11:50:34
>> hydra kernel:  I/O error: dev 08:11, sector 26908288 Nov 18 11:50:34
>> hydra kernel: raid5: Disk failure on sdb1, disabling device. Operation
>> continuing on
>
> Unfortunately, the ASC and ASCQ codes are not given to better qualify
> the error, but this is a *hardware* not medium error.  This typically
> means that the device believes one of its components has failed.

Where can I get these codes? I'd like to find out wth went wrong with
these babies!


>>  2 devices
>> ----------
>> Nov 18 15:38:45 hydra kernel: scsi0: ERROR on channel 0, id 8, lun 0,
>> CDB: Read (10) 00 03 27 23 0f  00 00 f8 00
>> Nov 18 15:38:45 hydra kernel: Info fld=0x327231e, Current sd08:41:
>> sense key Medium Error Nov 18 15:38:45 hydra kernel:  I/O error: dev
>> 08:41, sector 52896472 Nov 18 15:38:45 hydra kernel: raid5: Disk
>> failure on sde1, disabling device. Operation continuing on
>>  3 devices
>
> This is a typical medium error.

Mph. It does look like the two disks have kicked the bucket, each in its
own way..


Thanks once again! You surely did shed some light here :)

Kristian



More information about the aic7xxx mailing list