Puzzle for Doug...

Robert G. Brown rgb at phy.duke.edu
Tue Jul 28 12:25:54 PDT 1998


On Tue, 28 Jul 1998, Mike Isely wrote:

> Well since the aic7xxx hardware executes DMA on its own behalf, that sort
> of memory access might look "different" enough to the hardware to expose a
> latent race condition.  Certainly there's more memory contention going on
> with the aic7xxx stuff in the picture. 

Good point.  I also am wondering if the high speed of the CPU's, the
memory and the U2 controller itself combine to reveal a race
condition.  I just really believe that the race is in the driver.

> Such memory tests never amount to more than a quickie existence check.
> "Leaky" DRAM cells (if such a thing could happen) can't be picked up
> for example because it would take many many microseconds for the bit(s) to
> go bad.  BIOS memory scans run way too fast for that.

Again, if it were "raw" bad DRAM, the system simply wouldn't work
regardless of the presence/absence of the aic7xxx driver.  Something
else would be using the critical memory during boot and fail.  I like
your DMA/race/contention hypothesis below much better.

> 
> > 
> > The only way that I could see the problem being bad memory is if the
> > SDRAM they put in the systems is somehow marginal and occasionally
> > fails but ONLY IN A WAY THE AIC7XXX DRIVER TWEAKS!  And only on the
> 
> Without any DMA devices active in the system, the memory activity is going
> to be limited to whatever the CPU causes.  Is there any known-DMA going on
> without the aic7xxx running?  With multiple independant (fast) devices
> initiating memory access, all sorts of contention issues can arise.  Of
> course, this is supposed to work, but without the aic7xxx stuff active you
> might not be beating on it hard enough to cause the trouble.  Remember the
> RZ1000 IDE problem a few years back?

Yeah, this occurred to me -- I have an eepro100 in the system and
there is indeed network traffic, especially during diskless boots.
It's harder to see this as a problem in NON-diskless boots, though.
Also, the network device is formally probed and initialized only AFTER
the scsi device.  Finally, I unplugged the cable during a boot or two
so that it wan't actually receiving packets during boot.  No effect.
Still, a definite possibility.

> 
> Just fishing for ideas for ya.  I think a game of musical hardware is
> definitely the next step here.  But even that may not give conclusive
> results if something in Dell's configuration is "right on the edge". 

And I appreciate it!  But *moan*...

    rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu




To Unsubscribe: send mail to majordomo at FreeBSD.org
with "unsubscribe aic7xxx" in the body of the message



More information about the aic7xxx mailing list