How to troubleshoot a frozen boot sequence
Billy Newsom
billy at nlcc.us
Mon Jan 25 01:10:30 UTC 2010
I am not sure why, but here was my solution.
I determined through a lot of poking that the Master Boot Record of each
drive. Here is what I found out:
1. My backup drive (ad0) had the FreeBSD boot manager installed.
2. My main drive (twed0) had the FreeBSD MBR installed.
So, what is the problem? All I could figure is to install the boot manager
(called boot0cfg) onto my main drive. Silly, but it worked.
Why, I don't have a clue. I do, by the way, remember purposely using this
setup when I ran sysinstall to configure this machine. I felt that the ad0
drive needed a boot manager (just in case it was used someplace else) and the
main drive would not need a boot manager. But nothing ever indicated to me
that a standard MBR on twed0 would not work if ad0 was missing.
Here is my partition table from twed0:
# /dev/twed0
g c60801 h255 s63
p 1 0xa5 63 976768002
a 1
Notice there is just one partition and it is active. But it wouldn't boot
until I ran:
bootcfg -B twed0
which keeps the slice table the same.
Once I was done, the server will now boot with or without the ad0 drive. In
case of a backup drive failure, I had to also mess with fstab:
1. I had to add the "noauto" option, as someone suggested.
2. I had to disable all fsck passes (3 didn't work -->0) or fsck failure will
boot single user.
My question is now, do I write a script to mount the drive (too late, I did)
during boot and then to run fsck also? I am not sure how fsck should be run,
but I assume it is kind of important.
My main challenge was determining when to mount the disk. Here is my solution
and my script so far that seems to work.
=====================
#!/bin/sh
# mounts my special drive
# TODO: Need to fsck it
# PROVIDE: mountbackup
# REQUIRE: mail
# KEYWORD: nojail
. /etc/rc.subr
name="mountbackup"
start_cmd="mountbackup_start"
stop_cmd=":"
THIS="/disk250"
HOSTNAME=`/bin/hostname`
MAILTO=root@${HOSTNAME}
TOD=`/bin/date`
mountbackup_start()
{
local err
# Mount "backup" filesystems.
echo -n "Mounting $THIS Backup filesystems"
mount $THIS
err=$?
echo '.'
case ${err} in
0)
;;
*)
echo "Mounting $THIS filesystems failed," \
" but it's okay for now. Sending mail to $MAILTO"
(echo " Mounting $THIS filesystems failed on boot!"
echo " "
echo "Host: $HOST Date: $TOD" | \
mail -s "FAILURE to mount $THIS on $HOST" $MAILTO
;;
esac
}
load_rc_config $name
run_rc_command "$1"
=====================
Billy Newsom wrote:
> Nathan Vidican wrote:
> > To me, it sounds like you have two issues to deal with here:
> >
> > #1 - booting off of the twed0 disk, what is your systems' BIOS currently
> > set to boot from, from the way you describe it's almost as if the system
> > is booting from ad0 - in which case yes, you will have to put a valid
> > boot config onto twed0
>
> I feel that I have run across a common and old "SCSI v IDE" battle (The
> FreeBSD Handbook still talks about it). Even though I make the drive
> controller (the twe = 3Ware SATA controller) as my first boot drive in
> BIOS (effectively 0x80 as I understand it), FreeBSD does not ever pay
> attention to the BIOS's numerical order. (See my reason below*) It wants
> to find stuff on ad0 and boot that drive if it exists.
>
> My supposition is that since I had twe0 and ad0 running during my 7.2
> install, that the correct drive partition and MBR stuff were applied to
> get it to boot AS-IS, but...
>
> When it is not as it is now, It freezes at the boot loader, attempting
> to find ad0.
>
> It is either
>
> a. Finding ad0 in fstab and really wishing it was there
> or
> b. The boot strap code is physically on ad0 and not twed0 because the
> Sysinstall process never wrote it there.
>
> I think it is b. If b, the boot process may be:
>
> Stage 1: BIOS picks twe0 to be the first drive to attempt a boot.
> Stage 2: MBR (boot 0) -- located on twe0
> Stage 3: boot1 -- located on twed0 (BTX Boot Loader?)
> Stage 4: boot2 -- located on ad0 (FreeBSD/i386 bootstrap loader 1.1?)
> Stage 5: Boot Loader -- shows menu on twed0s1a
> Stage 6: Kernel boots up on twed0s1a
>
> And so when I remove ad0 to simulate a backup drive failure, the stage 4
> tries to run a missing bootstrap loader from twed0.
>
> Stage 4: boot2 -- missing on twed0, system hangs.
>
> I think this is happening because it is the BTX loader which may find
> and concatenate the BIOS drives, getting confused, and switching the
> boot to ad0 for just the one stage that finishes the bootstrap.
>
> I think one solution is to (next time) not install my backup drive until
> after Sysinstall is long done! I think it's a sysinstall bug, some of this.
>
> * My Reason for saying that is my guess that the sysinstall program saw
> the ad0 as something important, and included it in the chain of the
> boot. For example, when I was done SLICING my drives in Sysinstall, the
> silly thing then got the "w" write command and went out there and made
> some (wrong) decisions under the assumption that ad0 would NATURALLY
> (via BIOS) be part of the boot process. So the right code never got
> written to twe0 in the right places. Sure, it got all the kernel and I
> told it to put a standard FreeBSD MBR, but it must be missing something
> on track 0.
>
> > #2 - you could add the flag 'noauto' to ad0 from within fstab - this
> > will allow the system to boot without mounting the disk (alleviating the
> > dreaded single-user-mode). Use a startup script in /usr/local/etc/rc.d
> > to then mount the disk if available on bootup. I've done similar setups
> > to this before where we were using external USB drives for backup and
> > weren't 100% sure they'd always be connected in the case a server might
> > be rebooted - worst case, you'll end up with it not mounted, but the
> > system will still be up at least.
>
> I will give it a try. I need to do something to correct this second
> issue for certain. My ad0 is a good spare, but it's old.
>
> > --
> > Nathan Vidican
> > nathan at vidican.com <mailto:nathan at vidican.com>
> >
> >
> > On Fri, Jan 22, 2010 at 12:53 PM, Billy Newsom <billy at nlcc.us
> > <mailto:billy at nlcc.us>> wrote:
> >
> > I am doing a test run on a production server. It has 2 hard drives.
> >
> > ad0 (mounted on /disk250 in a single slice plus SWAP)
> > twed0 (mounted on / /var /usr and a SWAP)
> >
> > The twed0 is a hardware mirror and my main drive.
> > ad0 is just for backups.
> >
> > What the issue is, and you probably know where I'm heading. The boot
> > process freezes if I remove the ad0 (to test a drive failure
> condition)
> >
> > It freezes after saying:
> > BTX boot loader.... etc.
> >
> > FreeBSD/i386 bootstrap loader 1.1
> > It spins for a second, then stops... unless I have ad0 in the
> computer.
> > /boot/kernel/kernel text=0x7b03a0 data=0xcdee0 /
> >
> > And it never gets to the boot menu.
> >
> > So:
> >
> > 1. Should I put a new boot0config on the twed0 drive? If so do I
> > boot from a CD to do that?
> >
> > I need to potentially do something also to my disk labels and my
> > fstab so that I don't boot to single user mode if drive ad0 fails. I
> > haven't done this exact type of thing before, so I am looking for a
> > little help.
> >
> > my fstab:
> > /dev/ad0s1b none swap sw 0
> > 0
> > /dev/twed0s1b none swap sw 0
> > 0
> > /dev/twed0s1a / ufs rw 1
> > 1
> > /dev/ad0s1d /disk250 ufs rw 2
> > 2
> > /dev/twed0s1e /tmp ufs rw 2
> > 2
> > /dev/twed0s1f /usr ufs rw 2
> > 2
> > /dev/twed0s1d /var ufs rw 2
> > 2
> > /dev/acd0 /cdrom cd9660 ro,noauto 0
> > 0
> >
> >
> > I tried to read the MBR from the twed0 drive, and the program
> > couldn't read it. The one from the ad0 drive is readable and I saved
> > a copy of it.
>
More information about the freebsd-questions
mailing list