Kernel Panic on Cold Start with Adaptec AIC7XXX Rev. 6.2.4

Justin T. Gibbs gibbs at btc.adaptec.com
Tue Dec 18 14:58:25 PST 2001


Sorry for the slow response on this.  I've been snowed under working
on Linux U320 support...

>I updated from Linux 2.4.8 to 2.4.13 and now get a kernel panic on a 
>cold start. This reflects the AIC7XXX  driver change from
>Rev 6.2.1 to 6.2.4.  After an initial kernel panic, a warm "reset" 
>effects a successful boot.  My controller is the Adaptec
>3940 Ultra SCSI PCI adapter with two controllers.  scsi1 is enabled in 
>the controllers BIOS, but is not cabled.

The problem is caused by either the firmware or the kernel reading
a piece of SCB or scratch ram that has not been previously written
to.  The driver purposefully does not explicitly write to every piece
of memory prior to initialization.  This would mask bugs in the
initialization code - all referenced locations should be initialized
to their correct values.  This works great so long as I can reproduce
the parity error here.  Unfortunately, even with lots of different
controllers in lots of different machines, I have not been able to do
so.  I've also spent some time reviewing the code changes between 6.2.1
and 6.2.4 and have not found a "smoking gun".

Since you can reproduce this, I'm hoping you can help in tracking this
down.  By performing a "binary search" on the two types of memory, we
should be able to figure out the location that is causing the problem.
Once we have the offset, determining why that location is referenced
prior to being initialized should be pretty easy.

You'll need to modify two pieces of code:

1) drivers/scsi/aic7xxx/aic7xxx.c:ahc_reset(), add these lines
   to the bottom of the function:

	for (wait = BUSY_TARGETS; wait <= SEQ_FLAGS2; wait++)
		ahc_outb(ahc, wait, 0);

   If you still can't cold boot your system, remove this code and
   try step two below.

   We now have to narrow down which location within the above range is
   causing the problem.  You should be able to do a binary search on
   the location (e.g. half the amount initialized, determine which half
   is at fault, then recurse on the half with the problem).

2) drivers/scsi/aic7xxx/aic7xxx.c:ahc_probe_scbs(), do something like this:

/* 
 * Determine the number of SCBs available on the controller
 */
int   
ahc_probe_scbs(struct ahc_softc *ahc) {
	int i; 

	for (i = 0; i < AHC_SCB_MAX; i++) {
		int j;

		ahc_outb(ahc, SCBPTR, i);
		ahc_outb(ahc, SCB_BASE, i);
		if (ahc_inb(ahc, SCB_BASE) != i)
			break;
		/* Added code */
		for (j = 1, j < 32; j++)
			ahc_outb(ahc, SCB_BASE, 0);
		/* End added code. */
		ahc_outb(ahc, SCBPTR, 0);
		if (ahc_inb(ahc, SCB_BASE) != 0)
			break;
        }
        return (i);
}

   Perform a similar binary search on the SCB memory until you determine
   which position is at fault.

--
Justin

To Unsubscribe: send mail to majordomo at FreeBSD.org
with "unsubscribe aic7xxx" in the body of the message




More information about the aic7xxx mailing list