boot sector f*ed

Steve Bertrand steve at ibctech.ca
Thu Aug 13 23:43:45 UTC 2009


PJ wrote:
> Ruben de Groot wrote:
>> Hi PJ,
>>
>> On Thu, Aug 13, 2009 at 09:53:06AM -0400, PJ typed:
>>   
>>> I apologize for the lengthy explanation below, but perhaps it will give
>>> some insight on what is see from this end:
>>>     
>> You probably won't get much helpfull response. When troubleshooting, it's
>> allways best to try to break down the problem in tiny bits and solve them
>> one by one, asking specific questions when you get stuck.
>>
>> <snip>
>>
>>   
>>> to be in a position to do what is required. For one thing, I do not know
>>> how I can save testing output to an external file when I am working on a
>>> temporary shell on the problem machine. Perhaps you could indicate what
>>> I should be doing or where to look for information.
>>>     
>> What kind of "temporary shell"? You mean the fixit console or livecd? You can
>> allways redirect the output to some file in /tmp for example and then scp
>> it to another computer. Or mount_nfs or even mount_smbfs a windows share and 
>> save the output there.
>>
>>   
>>> And in checking the disks with fdisk, fsck, and even running that weird
>>> regenerate progam... I wasn't able to come up with anything
>>> significant... that is, the configuration of the disks seemed to be ok,
>>> the boot sector was ok as it was able to boot but the when the system
>>> was being mounted something went wrong... and looking back, I vaguely
>>> recall something about a "soft update" or something like that which
>>> seems to indicate some stumbling block in the software and not hardware.
>>>     
>> soft updates inconsistencies perhaps? They can be caused by faulty hardware.
>> Or by power failure.  What did you do about them? In such a situation the
>> system will drop you into single user mode where you can do an fsck. 
>>
>>   
>>> All that I am seeing is that there is either a problem with the bios
>>> (which I even reinstalled and that changed nothing in the functioning)
>>> or something is going on with the OS.
>>>     
>> How exactly did you see this? And you reinstalled the BIOS ???
>>
>>   
>>> I now have set up another instance of 7.2 on a different disk on the
>>> 2.4ghz machine and I already find something strange... after installing
>>> the minimum configuration, I installed the packages - samba3.3.3,
>>> cvsup-without-gui, and smartmontools. I tried to run smartctl and cvsup
>>> but nothing worked. The path variable was correct but the shell just
>>> would not pick up on it. I had to start the programs from their directories.
>>> That just doesn't make sense.
>>>     
>> It does if your shell is csh (the default shell for root). You must issue
>> the "rehash" command to re-read everything in your path after installing new
>> software.
>>
>> Ruben
>>   
> Thanks Ruben,
> Frankly, I don't know an;ymore what I'm doing nor what is going on... it
> used to be so easy to set up FBSD even if it took a lot of time to
> compile... but it seems to be getting less and less intuitive and user
> friendly.

Perhaps you are becoming a bit anxious due to frustration. The intuition
part of FreeBSD hasn't changed. For those who still like to do things
the 'old' way, it still works the same way it always has. For those who
like the new/updated tools, they work too.

> How can I break thinkgs up into little bits and pieces without just
> smashing the whole show to bits and pieces ;-)

> There are so many problems, I have not idea where to begin.

Pen and paper. Write down the big problem, even if it's simply "it
doesn't work".

>From there, have a coffee, have a smoke, or do what you like to do in
order to clear your mind to the point where you are facing the situation
from scratch; in a calm, non-biased manner. Then focus on the first
issue that crops up...write it down. If this initial problem is a
show-stopper, start there. If it is not, continue despite the problems,
and jot down the next roadblock.

When I'm faced with a seemingly insurmountable aggregated bunch of
problems, I prefer to "halve" the issue until I can recognize where the
issue(s) are, and at the same time, I usually make notes on what works.

> Right now I'm just fixing up a new set up of 7.2 on another disk and
> we'll see what that does. 

...now stop, and take a little break and reflect on what the strategy
should be for the next step.

> Then I will re-setup the files I had recoverd,
> see if they work

...now take another little break, and reflect again. Reflect on what you
have done, and about what you are about to do. This brainstorming may
help develop different tactics.

> and then do a last and final install of everything and
> see if that works. 

...break.

> And if there is a problem then, then I will know for
> sure that it is not a hardware problem.

You can *never* know for *sure* that hardware isn't an issue. Things
such as minor electrical damage (for instance) can be exceptionally
intermittent, slowly progressive, extremely hard to troubleshoot, and
may disappear for months before it rears it's head again.

Another hard-to-troubleshoot aspect of hardware are the edge cases. I've
run into issues in the past where I knew there was a problem, but it
took days to solve (this is an ISP environment, so when there is a
critical emergency, hardware is swapped, and then tested in the lab).

We ran into an issue where a box would sometimes freeze, and sometimes
reboot. The rest of the time it ran fine, even under extreme loads. It
turned out to be the motherboard... some of the protective coating had
'flaked off' between the CPU and the RAM. Only under extreme duress and
very specific memory access were we able to identify where the problem
was. No... I don't normally troubleshoot hardware that way (instead,
it's just replaced), but I was personally interested to find out why I
could not retrigger the problem.

> In using computers, in general,
> over the past 20 plus years I have only had maybe 6 crashes... mostly
> Winbloz 

...Windows...

> and about 3 with FBSD - and only 1 was because of defective
> hardware (a disk)... 

I've done this for ~15 years, and have seen at least 50 very serious
crashes. Almost all of them I have found and fixed the problem myself,
or have mentored those who want/need to learn how to fix the problem.

The relative scale perhaps differ, however, my colleagues/staff are all
trained on two very simple, and very important pretences:

- do not panic
- do not rush

Either one leads to frustration, and nearly always has the effect of
making the problem worse. Calmly halve the problem until you find out
what it is. Even when there are 10k clients waiting for a fix, it's
better to get it done right, instead of making it worse.

> the rest was power outs and 1 erroneous shutdown...
> not bad ... and I never lost irreplaceable files. :-)   Took some time
> to recover them, but recover did as recover should.

...backup...always. Even if you don't think you need it, back it up anyway.

> Oh, well, before I give it all up, I'm giving it one final shot.

This last sentence makes me believe that you are coming at this with a
mindset of "fsck it... if it don't work this time, that's it", which I
have found will set your mind up for failure, as that's what you've
trained your brain to expect. After your brain knows _I'll quit after
this_, you will work as diligently as possible to ensure that it happens.

Take a break... there have been many, many well-written attempts to
help. All of those who have helped already will still be here no matter
what!

Steve
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3233 bytes
Desc: S/MIME Cryptographic Signature
Url : http://lists.freebsd.org/pipermail/freebsd-questions/attachments/20090813/b92bb14f/smime.bin


More information about the freebsd-questions mailing list