5.1.0-pre3-to-pre4

Mon Jul 20 15:21:56 PDT 1998

Robert G. Brown wrote:
> 
> Well, I'm back at work and am still trying 5.1.0 on a Dell Poweredge
> with dual 400 MHz PII's.  I just finished trying the pre3-to-pre4
> patch that guarantees a reset of the device but get a lockup after
> exactly the same messages in exactly the same way that I reported
> Friday.  I continue to get a parity error in a loop that is hit every
> four or five seconds (from the "spurious interrupt" part of the
> driver, I believe) if I load only with maximum verbosity, and die
> after the same message as I laboriously typed in on Friday (with only
> one Data Parity error).
> 
> The error persists identically across SMP and UP kernels (2.0.34, for
> what it is worth).  I've examined /proc/pci before loading the module
> and it is unremarkable, except that the 7860 and 7890 appear to be
> sharing an interrupt (IRQ 10) with different ioports.  I can enclose
> this and any other pre-module-insert data requested on demand -- the
> system boots stably diskless so I have complete access to the running
> /proc and more.
> 
> I must point out that the code added to "definitely reset the bus" IS
> NOT EXECUTED!  I get no messages concerning "Resetting channel [A/B]"
> (or any of the added error messages saying not to use no_reset, which
> I'm not using;-)!
> 
> I'm going to try to hack at the code some more and see if I can see
> any reason that the device reset segment isn't being run.  The obvious
> answer is that the crash is somehow occurring before the device reset
> segment of the code, if that is at all possible.

It's not only possible, it's guaranteed.  If you aren't getting any of that
reset code, then the machine has to be crashing before it gets there.  The
easy explanation for why this is so is that you are getting these interrupts
as soon as the request_irq() call is made.  This is the first moment at
which we can get any interrupts.  It's not hard to believe that we would
take 500 interrupts before getting past that to the reset code.  Try this. 
Check out the status of the file /proc/stat before insmodding the driver. 
The line that starts with intr is the interrupt counts on your machine.  It
differs from /proc/interrupts in that this file reports *all* interrupt
sources, even if they aren't yet allocated to a handler.  This way, you can
check on some other things.  For example, on my P-II SMP system, with the
later 2.1.x kernels, I've had to manually disable IRQ9 from the IO-APIC IRQs
or else the system would, from boot up on through all operation, be
undergoing an IRQ storm from IRQ9 regardless of any devices attached or
not.  So, check to make tuse the interrupt for your card isn't already
active and causing an IRQ storm before you load the module.  Assuming this
is clear, you can also do the following:

in aic7xxx_detect():
  In the third global loop struct, the one where we pull the four lists
  and call aic7xxx_register on each item in each list, wrap the calls
  to aic7xxx_register() as such

#if LINUX_VERSION_CODE < KERNEL_VERSION(2,1,0)
  save_flags(cpu_flags);
  cli();
  aic7xxx_register(...);
  restore_flags(cpu_flags);
#else
  spin_lock_irqsave(&io_request_lock, cpu_flags);
  aic7xxx_register(...);
  spin_unlock_irqrestore(&io_request_lock, cpu_flags);
#endif

and of course, define unsigned long cpu_flags; at the top of the function. 
Then let me know if you still don't see any reset code being executed and
the resultant delay as well :)

-- 

 Doug Ledford  <dledford at dialnet.net>
  Opinions expressed are my own, but
     they should be everybody's.

To Unsubscribe: send mail to majordomo at FreeBSD.org
with "unsubscribe aic7xxx" in the body of the message