boot sector f*ed

Ian Smith smithi at nimnet.asn.au
Fri Aug 14 03:42:00 UTC 2009


On Thu, 13 Aug 2009, PJ wrote:
 > Subject: Re: boot sector f*ed
 > 
 > Roland Smith wrote:
 > > On Wed, Aug 12, 2009 at 03:54:31PM -0400, PJ wrote:
 > >   
 > >> Well, I've been looking at the disk(s) and I have found some interesting
 > >> "shei**e" that doesn't make sense.
 > >> 1. The fbsd minimal installation that I had set up for recovery of the
 > >> previous crash does not boot... Now, why in Hades is that? I hadn't
 > >> touched the disk since last using it to look at the corrupted disk
 > >> through an usb connection. The current crashed installetion was done
 > >> afterwards and the only change was in the bios to set the boot disk to
 > >> the new installation. The installation was finally completed with all
 > >> the programs working fine... and then BOOM!
 > >> 2. I tried booting from all the disks on the machine (4 disks) and only
 > >> the current crashed one booted!... so, it's not the boot sector at
 > >> all... something is screwy on this machine; either the motherboard is
 > >> buggered (which I doubt, but not entirely), the disks are finished or
 > >> theres some kind of gremlin lurking in the confines of the box.
 > >>     
 > >
 > > This sounds more and more like hardware troubles. 

Indeed it does, more and more since.

 > > A few things to check (in order of decreasing likelyness IMHO):
 > > - Cables to the harddisks: Make sure they are properly connected. A machine of
 > >   mine suddenly started getting disk read errors after I put in another
 > >   graphics card. It turned out that the SATA connector to that drive had come
 > >   partially loose.
 > > - Powersupply: check the voltages (preferably under load) with a monitoring
 > >   app like mbmon. If that's not possible, check in the BIOS. A failing
 > >   powersupply can give weird unreproducable errors. If you have ever heard a
 > >   popping noise from the machine it could be a short in the powersupply caused
 > >   by dust. I've seen that fry motherboards.
 > > - PCI cards: check that they are seated properly. Although in this case I'd
 > >   say this seems the least likely.

I usually bet first on the power supply .. but it's not my money :)

 > I apologize for the lengthy explanation below, but perhaps it will give
 > some insight on what is see from this end:

Chomping heavily; some useful background but a bit much, no disrespect.

 > machine. The troubles began when I tried to install flashplayer on the
 > 7.1 machine.

Or could be a hardware issue that became evident around the same time?

 > bash4 and fluxbox for X. Everything seemed to work fine. I ran all the
 > programs and saw that all the files I had recovered from the crash were
 > recovered and working. Man, was I ever happy!
 > I shut down for the night and looked forward to getting bask to normal
 > development of my current projects.
 > In the morning, I boot up and WHAM!... the system is f**cked. And so am I.

Have you tried swapping the power supply?  I assume you've swapped the
cables, removed cleaned and replaced cards, checked CPU temperature etc?

 > Now, the problem is that it is imperative to be able to figure out what
 > exactly is going on.  Well, the problem with that is that I do not seem
 > to be in a position to do what is required. For one thing, I do not know
 > how I can save testing output to an external file when I am working on a
 > temporary shell on the problem machine. Perhaps you could indicate what
 > I should be doing or where to look for information.

Steve Bertrand has most recently addressed those issues, good advice, 
especially "don't panic" :)

 > Another problem is rather a strange quirk or I don't know what - The
 > problems I am having are on two very similar machines: 1 is a MSI 6758
 > 875P NeoFisr motherboard running on a Pentium 3.0ghz CPU; the other is
 > the identical board with a Pentium 2.4ghz CPU. The strange thing is that
 > even with identical bios, the bios does not act the same on both
 > machines. The final install that was so promising was on the 2.4 ghz
 > machine. Except for being somewhat slower (I find it rather slow
 > compared to the the 3ghz, but maybe that's normal) it always worked
 > without problems.

Have you tried putting the HD from the problem 2.4GHz machine in the 
3GHz box, to see if it behaves properly there?  Have you tried running 
various hardware diagnostic programs over the flakey box?  Overnight 
memtesting can reveal other problems even if memory is fine.  There are 
quite a few diagnostic CDs around; some boot some Linux, some DOS even, 
it doesn't really matter if they can prove/disprove the hardware.

 > anyway - I tried booting a minimal installation on the 2.4ghz machine
 > from a disk that was set up before the crashed disk was installed and
 > that boot did not work... there was no reason for it to not work... all

Smells like flakey hardware .. intermittent, inexplicable glitches.  It 
might survive hours on one workload, minutes on another, no sense to it?

 > All that I am seeing is that there is either a problem with the bios
 > (which I even reinstalled and that changed nothing in the functioning)
 > or something is going on with the OS.

After you've thoroughly proven the hardware is AOK under sustained and 
varied pressure, then you can suspect software issues - which tend to be 
far more consistent and repeatable - but if the hardware's acting flakey 
then you likely won't see any consistency in software issues, which does 
seem to concur with your descriptions to date.

 > I must say, it is weird that with FBSD 7.2 things have not been going
 > well at all...

Well, since that particular point in time, at least?

 > So, now I am trying to set up 7.2 on this new disk and let's see what
 > happens.
 > I'm going to run some tests on those disks, including the one that maybe
 > is defective... I'll post results if possilbe.

I wouldn't even try doing anything with disks until positively ruling 
out any hardware issues, given that at one stage you _apparently_ lost 
the boot sector, and it or another possibly intermittent issue prevented 
booting.  You've since said that the 3GHz box with same BIOS boots up
'differently'.  Has it always been so?

Sometimes, quite often in fact, I've found just disassembling, thorough 
cleaning, fresh heatsink paste maybe, and reassembly solves many issues, 
without ever knowing what exactly did the trick.  Life's like that .. 

HTH, Ian


More information about the freebsd-questions mailing list