pre10 on a 2940u2w still shows BRKADRINT

Robert G. Brown rgb at phy.duke.edu
Sun Sep 20 10:48:19 PDT 1998


On Sat, 19 Sep 1998, Doug Ledford wrote:

> Robert G. Brown wrote:
> 
> > > > into a similar BRKADRINT bug.  Whereas the pre5 would loop with
> > > >     Data-Path Ram Parity Error
> > > >     PCI Error Detected
> > > >     (scsi 0) SEQADDR = 0x50
> > > > the pre10 kernel would loop with
> > > >     Data-Path Ram Parity Error
> > > >     (scsi 0) SEQADDR = 0x17d        [the first time]
> > > >     (scsi 0) SEQADDR = 0x1  [subsequent times]
> > > ...
> > 
> > The problem with a relatively new driver is that it is difficult to
> > separate out problems with the driver, problems with the hardware (yes,
> > some hardware IS just plain old defective), problems with the firmware,
> > and problems with the attached system BIOS and conflicts with other
> > devices on the PCI bus.
> 
> In this case, I think this is a problem with the driver.  However, just what
> that problem is isn't known yet :)  I'm currently leaning towards the idea
> that I haven't *completely* written every memory location on those cards,
> and if you want to be completely anal about the Adaptec docs, you are
> suppossed to write something to every location on those cards to initialize
> the parity bits.  So my next driver version will be putting in code to
> hopefully hit more of the locations and get things set up properly.  You
> never know, maybe this error is from a parity error in the data fifo. 
> Anyway, as soon as I can re-produce the problem here, then I'll get it
> fixed.  I'm working on that :)

Could such a problem survive power down?  What you describe certainly is
consistent with my early experiences with a "working" pre7 driver, but
only if I booted NT at least one time first (presumably it wrote
everything and set all the parity bits?).  However, I would have
expected that a COMPLETE power down (where I pull the plug in back and
hit the power button one more time to drain the capacitors in the
attempted ATX startup surge) would have reset everything to ground
zero...

Your problem reproducing it may be statistical.  Right now the odds of
it occurring are first order estimate of 1/8, but by machine, not by
bootup or powerdown.  You might try booting your system a bunch of times
with pre3-pre7 -- it is quite possible that something in pre3 or pre4 is
"toxic" enough that it corrupts the right places in a way that survives
between boots (somehow).  Is there any chance that one of these drivers
could have overwritten the Adaptec BIOS in any way or part (this may be
a dumb question -- I don't really know how the BIOS flash works in the
first place -- but if it is a matter of writing to the wrong part of the
address space it seems possible which is why I ask).

> 
> > cards, and disks).  As of the linux 5.1.0pre10, I seem to be able to
> > boot and run nearly all of them. 
> 
> Excellent :)  I was hoping more of your machines would start working with
> pre-10.

All the ones that never booted or ran anything earlier than pre7 run
fine.  Hence my suggestion with pre3 or the like.  Of course, even 16
machines isn't much of a statistical universe, especially with one known
hardware problem (the bad memory bit), although the Dell BIOS seems
stable against this one -- it just marks down the total available
memory.  This won't help linux in the long run since I have to tell it
how much memory the system has by hand anyway, but it won't crash it
immediately.

> > 
> > If there IS a good suspicion that it is still a driver problem, I still
> > have two good (that is, bad:-) systems that I can try any solutions on.
> 
> Well, I still think it's a driver problem.  Essentially, my stance on the
> issue is that the driver should be able to work with the hardware regardless
> of BIOS bugs since the only thing we really use the BIOS for are just a
> *very* select few items (such as the proper state of STPWRLEV in the
> DEVCONFIG register).  These are machine/device dependant, so we can't init
> those reliably (although pre-11 allows you to pass special params to the
> driver to force these settings).  Other than a few things like that, the
> driver should be able to work regardless of any BIOS bugs.

As always, if there is anything I can do to help things out, let me
know.  I could, for example, give you an account on our network and
bring up one of the systems diskless (with no aic7xxx loaded).  You
could then work on a system that definitely has the problem whenever I
was around to go down and hard-boot it when it got stuck.  Booted
diskless with RO root, you literally couldn't hurt anything I couldn't
put back in two minutes.  If you want to try this, send me a passwd
line and I'll set it up tomorrow or Tuesday.

    rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu




To Unsubscribe: send mail to majordomo at FreeBSD.org
with "unsubscribe freebsd-aic7xxx" in the body of the message



More information about the aic7xxx mailing list