Adaptec termination problem...

Robert G. Brown rgb at phy.duke.edu
Fri May 22 11:33:51 PDT 1998


Dear aic7xxx folks:

I've been using aic7xxx controllers under linux for years, usually
with good success.  One exception seems to be associated with
motherboards with built-in adaptec controllers and vendor-set
termination.  I'm currently trying to debug inexplicable crashes on a
system with a Supermicro P6DLS (dual 300MHz PII with embedded
2940UW).  The system has a Sony CD-R and a HP DAT on the 50 pin
internal channel and an 18GB IBM HD on the 68 pin internal UW
channel.  I have termination set at auto, and it seems to correctly
set low/high off/on so as to not terminate the controller itself.

During boot the bios finds all devices correctly, the system boots
flawlessly, and it runs for quite a while with zero messages or
complaints of any sort.  I'm running Linux 2.0.33 SMP, with the latest
5.0.17 Adaptec installed just this morning, and am running a properly
SMP-patched eepro100 driver for the network card.  The crashes I'm
reporting appear to occur independent of whether or not the kernel is
SMP, so I don't believe that the problem is a kernel SMP deadlock or
the like.  I also set mem=254M in lilo.conf so that running out of
memory is not a problem.

ANYWAY, when a large (~64M memory) disk I/O intensive job is run on
the system (on only one processor, with nothing in particular
happening on the others the system runs for a while and then simply
dies in between clock ticks with no messages, warnings, or hope of
recovery.  Upon reboot, the system has a fairly corrupted disk in the
working partition of the large job (not surprisingly, it dies in
midstream with no sync) and sometimes the SCSI controller has to be
power cycled to come up cleanly at the BIOS level, making me think
that it is somehow involved in the crash rather than a passive
participant.

I had somewhat similar symptoms on a nearly identical system that went
away when I BOTH checked all the SCSI termination AND fixed the
network drivers to be SMP safe.  I'm still not sure what fixed what.
On this particular system, I've carefully checked the termination on
the 50 pin chain and it is OK, but the vendor (Aberdeen) appears to
have left the IBM unterminated, as nearly as I can tell with no
documentation on the disk.  This same vendor had also screwed up
internal termination in the previous system they sold me; in this case
it was the CD-R/HP tape that were not correctly terminated.

I'm trying to fix the termination of the HD, but without docs it is a
matter of guessing where to put a jumper and so far I'm having no
luck.  And, I'm not certain that termination is the problem at fault.
SO, at last my questions:

  a) Is there a software mechanism for determining faulty SCSI bus
termination?  Preferrably reliable and non-fatal to the system?  This
is almost certainly an ongoing problem with sloppy vendors everywhere;
the trouble is that one can often get away without terminating on a
low-impact system or can't tell that a crash is due to improper
termination on a system that crashes all the time anyway (e.g.
Windoze).  Linux, however, tends to be awesomely stable even under
crushing loads -- crashes at a load average of <1/CPU is not normal
but STILL doesn't seem to flag the crashes in a way that helps you
debug the termination.

  b) Is there a chance that the real problem is with the IBM drive
itself?  I've never run an 18 GB disk before, and worry a bit that I'm
exceeding some built in limit that I didn't know existed.  Has anyone
out there had any particular problems (or success) with the 18 GB
drives?  Can anyone tell me how to jumper it to terminate it
correctly?  Is there any limit on how big the partitions can be under
linux?  Sorry if any of these questions are stupid and in a FAQ --
feel free to point me as gently as possible at the FAQ if they are.

   rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu




To Unsubscribe: send mail to majordomo at FreeBSD.org
with "unsubscribe aic7xxx" in the body of the message



More information about the aic7xxx mailing list