One more 2300 healthy (rats?)

Robert G. Brown rgb at phy.duke.edu
Tue Sep 22 07:34:04 PDT 1998


On Tue, 22 Sep 1998, Doug Ledford wrote:

> Robert G. Brown wrote:
> 
> > Anyway, the 9/19 cam boot.flp image booted my "sick" PowerEdge 2300,
> > found the 7890 device, and I could have proceeded to install freebsd if
> > I had any idea how to install freebsd from just one floppy.  It also
> > rewrote whatever internal register was causing the trouble, as
> > 5.1.0pre10 booted flawlessly immediately thereafter.  I have to confess
> > that I'm amazed that there is something that critical to function that
> > isn't cleared by a power cycle (including one where I jumpered NVRAM
> > clear and pulled the plug and punched the power button and...), but
> > there it is.
> 
> Now, the next real question is, if you go ahead and power that system down,
> unplug the cord, discharge the capacitors in the power supply, clear the
> NVRAM, and do everything else you cna imagine to make that system go back to
> original, will the pre10 boot up without first booting the FreeBSD-CAM
> floppy?  If so, then I'm stumped and amazed.  To the best of my knowledge,
> everything the FreeBSD ahc cam driver sets and everything the linux aic7xxx
> driver set are *all* volatile registers and locations in the sense that they
> go away with a power down.  So, if the machine is "fixed" so to speak now
> and no longer needs pre-booted after a power down to get pre10 to work, then
> something weird is going on.  Any clues what it might be in that case
> Justin?

Aww, and I already started to use the machine (yes, Virginia, I do
actually do MC simulations on these boxes...when they work).  Sigh.
I guess I do need to document that this recent "fix" via freebsd boot
survives powerdown.  I'll try the following:

  a) Power down, etc. and reboot as described.  My prediction is that it
will now work fine, because pre10 worked fine on systems that had never
been powered up before or that had been unplugged and cleared -- as
long as they were not already displaying this "hung" behavior. 

I really do think that something weird is going on because I see
differences in the boot-time behavior of nominally "identical" machines
-- something that shakes my belief in electronic determinism (a thing
that is none too strong anyway;-).  Allowing that any notion of a
WinDell "conspiracy" is nonsense (it was intended as a tongue-in-cheek
joke in the first place, and now of course Dell is working actively with
the linux community) there still appears to be solid evidence that there
is a non-volatile location in the 7890 subsystem on these systems that
survives total powerdown, the placement of the NVRAM-clearing jumper, an
adaptec-bios reset (in the card bios itself) and the POST/initialization
process, whatever it might be.  Here I'm at a disadvantage -- lacking
device specs I cannot speculate where such a location might be or how it
gets corrupted, but it does appear that it was corrupted in the Dells on
delivery and gets reset by WinNT and now freebsd on boot, but not by
pre10.

  b) So, I'll also try to power down, etc. and reboot an earlier image,
maybe pre3 or the like, that installed but then messed up.  By looking
at what a revision writes that "causes" the problem and what a revision
writes that leaves the problem alone, it may be possible to find a
location that was written to -- wrong -- that is now not written to at
all and that needs to be written to right.  This is going to be tricky,
as of four systems b1-b4 that I installed at the same time and on the
same day with the same image, b3 and b4 are still running (a guy here
has had them running a calculation the entire time so I haven't been
able to reinstall/reboot them).  It could be (if he ever finishes;-)
that when I power THEM down they won't come back -- that was my
experience with pre3 and at least one machine with pre7.  I haven't had
any opportunity to properly verify that the corruption problem itself is
reproducible, as only yesterday did I manage to fix one that was
corrupted!

> Yeah, don't "fix" that machine just yet :)  If you can, run the test above
> first.

It shall be carefully preserved in its dysfunctional state, except that
I will let Dell replace the bad RAM.  Hopefully it won't just suddenly
start to work when they do...

   rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu




To Unsubscribe: send mail to majordomo at FreeBSD.org
with "unsubscribe freebsd-aic7xxx" in the body of the message



More information about the aic7xxx mailing list