Problem w/ SMP and aic7xxx

Joseph T. Trudeau jttrudeau at mindspring.com
Tue Apr 13 00:59:39 PDT 1999


Problem:
When running a kernel which supports SMP, I receive errors of scsi
time-outs and resets under a load (it doesn't take much ... copying
files or compiling while do the trick).  I enabled as much verbose flags
as possible within aic7xxx and it seems as though scsi (some of which
are completed) commands are dropped (see included messages below).

If I boot with a non-SMP kernel, I CANNOT reproduce the errors (maybe I
can't generate enough traffic on the hard drives as with 2 CPU's
compared to just one).  Hence, I suspect SMP, IO-APIC, and/or the
aic7xxx driver.

I have tried multiple combinations of kernels from 2.0.36, 2.2.1, 2.2.3,
and 2.2.5, with multiple compile options (i.e. PCI Bridging, MTRR, and
anything that I found possibly related to the problem at hand).

Hardware:
  HP Netserver LH Pro
  128 Meg RAM (2 - 64 Meg DIMM)
  2 - Pentium Pro 200's
  2 - aic7880 on-board (PCI):  They share interrupt 11 and cannot be
changed
            to have unique interrupts for each (The EISA config utility
promptly
            configures both adapters to the same interrupt when either
is changed).

NOTE:  I noticed that the 1st CPU has 512K cache while the 2nd CPU only
has 256K cache.

Software:
  Kernel 2.2.5
  Raid 0.90
  aic7xxx v5.13

Except for the cache difference on the two CPU's, I have eliminated
hardware problems (or at least I think I have) via multiple tests of w/
and w/o SMP, diagnostic utilities, removing and swapping DIMMs, and etc.

The following are a limited set of debug messages from the aic7xxx
driver:
Apr 10 18:40:11 lachesis kernel: scsi : aborting command due to timeout
: pid 4274, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 48 00 e8 00
00 08 00
Apr 10 18:40:11 lachesis kernel: (scsi0:0:1:0) Abort called for already
completed command.
Apr 10 18:40:11 lachesis kernel: scsi : aborting command due to timeout
: pid 4275, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 49 04 90 00
00 08 00
Apr 10 18:40:11 lachesis kernel: (scsi0:0:1:0) Aborting scb 10, flags
0x4
Apr 10 18:40:11 lachesis kernel: (scsi0:0:1:0) SCB is currently active.
Waiting on completion.
Apr 10 18:40:11 lachesis kernel: scsi : aborting command due to timeout
: pid 4277, scsi1, channel 0, id 4, lun 0 Write (10) 00 00 49 04 90 00
00 08 00
Apr 10 18:40:11 lachesis kernel: (scsi1:0:4:0) Aborting scb 10, flags
0x6
Apr 10 18:40:11 lachesis kernel: (scsi1:0:4:0) SCB found on waiting list
and aborted.
Apr 10 18:40:11 lachesis kernel: (scsi1:0:4:0) Aborting scb 10
Apr 10 18:40:11 lachesis kernel: (scsi1:-1:-1:-1) 1 commands found and
queued for completion.
Apr 11 14:43:56 lachesis kernel: scsi : aborting command due to timeout
: pid 13827, scsi1, channel 0, id 4, lun 0 Write (10) 00 00 30 00 10 00
00 08 00
Apr 11 14:43:56 lachesis kernel: (scsi1:0:4:0) Aborting scb 11, flags
0x4
Apr 11 14:43:56 lachesis kernel: (scsi1:0:4:0) SCB disconnected.
Queueing Abort SCB.
Apr 11 14:43:56 lachesis kernel: (scsi1:0:4:0) Abort message mailed.
Apr 11 14:43:56 lachesis kernel: (scsi0:0:1:0) SCB 13 abort delivered.
Apr 11 14:43:56 lachesis kernel: (scsi0:0:1:-1) Reset device, active_scb
2
Apr 11 14:43:56 lachesis kernel: (scsi0:0:1:-1) Cleaning up status
information and delayed_scbs.
Apr 11 14:43:56 lachesis kernel: (scsi0:0:1:0:tag12) matches search
criteria (scsi0:0:1:-1:tag255)
Apr 11 14:43:56 lachesis kernel: (scsi0:0:1:0:tag9) matches search
criteria (scsi0:0:1:-1:tag255)
Apr 11 14:43:56 lachesis kernel: (scsi0:0:1:-1) Cleaning QINFIFO.
Apr 11 14:43:56 lachesis kernel: (scsi0:0:1:-1) Cleaning waiting_scbs.



I have also received these two kernel killing messages:

end_scsi_request: buffer-list destroyed
.
.
.
Kernel panic: Inactive in scsi_request_queueable
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freebsd.org/pipermail/aic7xxx/attachments/19990413/bc6753af/attachment.htm


More information about the aic7xxx mailing list