9.2 Boot Problem

Doug Hardie bc979 at lafn.org
Tue Apr 29 05:26:11 UTC 2014


On 13 April 2014, at 00:38, dteske at FreeBSD.org wrote:

> 
> 
>> -----Original Message-----
>> From: Doug Hardie [mailto:bc979 at lafn.org]
>> Sent: Saturday, April 12, 2014 7:08 PM
>> To: freebsd-stable at freebsd.org
>> Cc: dteske at FreeBSD.org Teske; Chris H
>> Subject: Re: 9.2 Boot Problem
>> 
>> 
>> On 10 April 2014, at 14:23, Doug Hardie <bc979 at lafn.org> wrote:
>> 
>>> 
>>> On 9 April 2014, at 16:53, Doug Hardie <bc979 at lafn.org> wrote:
>>> 
>>>> 
>>>> On 9 April 2014, at 14:17, dteske at FreeBSD.org wrote:
>>>> 
>>>>> 
>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Chris H [mailto:bsd-lists at bsdforge.com]
>>>>>> Sent: Wednesday, April 9, 2014 2:03 PM
>>>>>> To: Doug Hardie
>>>>>> Cc: freebsd-stable at freebsd.org List
>>>>>> Subject: Re: 9.2 Boot Problem
>>>>>> 
>>>>>>> 
>>>>>>> On 9 April 2014, at 13:49, "Chris H" <bsd-lists at bsdforge.com> wrote:
>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 9 April 2014, at 11:29, "Chris H" <bsd-lists at bsdforge.com>
>> wrote:
>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On 4 April 2014, at 21:08, Doug Hardie <bc979 at lafn.org> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> I put this out on Questions, but got no responses. Hopefully
>>>>>>>>>>>> someone here has some ideas.
>>>>>>>>>>>> 
>>>>>>>>>>>> FreeBSD 9.2.  All of my systems are hanging during boot right
>>>>>>>>>>>> after the screen that has the picture.  Its as if someone hit
>>>>>>>>>>>> a space on the keyboard.  However, these systems have no
>> keyboard.
>>>>>>>>>>>> If I plug one in, or use the serial console, and enter a
>>>>>>>>>>>> return, the boot continues properly.
>>>>>>>>>>>> 
>>>>>>>>>>>> The boot menu is displayed along with Beastie.  However, the
>>>>>>>>>>>> line that says Autoboot in n seconds. never appears.  It just
>>>>>>>>>>>> stops there.  These are all new installs from CD systems.
>>>>>>>>>>>> I just used freebsd-update to take a toy server from 9.1 to
>>>>>>>>>>>> 9.2 and it doesn't exhibit this behavior.  It boots properly.
>>>>>>>>>>>> I have updated one of the production servers with the latest
>>>>>>>>>>>> 9.2 changes and it still has the issue.  I first thought that
>>>>>>>>>>>> some config file did not get updated properly on the CD.  I
>>>>>>>>>>>> have dug around through the 4th files and don't see anything
>>>>>>>>>>>> obvious that would cause this.  I have now verified that all
>>>>>>>>>>>> the 4th files in boot are identical (except for the version
>>>>>>>>>>>> number.  They are slightly different).  I don't believe this
>>>>>>>>>>>> is a BIOS setting issue as FreeBSD 7.2 didn't exhibit this
>>>>>>>>>>>> behavior.  All
>>>>>>>>>>>> 4
>>>>>>>>>>>> systems are on totally different motherboards.
>>>>>>>>>>>> 
>>>>>>>>>>>> I tried setting loader_logo="none" in /boot/config.rc and
>>>>>>>>>>>> that eliminated the menu and Beastie.  I think the system
>>>>>>>>>>>> completed
>>>>>> booting, but the serial console was then dead.
>>>>>>>>>>>> It
>>>>>>>>>>>> did not respond or output anything.  I had to remove that and
>>>>>>>>>>>> reboot to get the console back again.
>>>>>>>>>>>> 
>>>>>>>>>>>> I need to get this fixed as these are production servers that
>>>>>>>>>>>> are essentially unmanned so its difficult to get them back up
>> again.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> No response here either.  Surely someone must know the
>> loader.
>>>>>>>>>>> I
>>>>>> have been digging
>>>>>>>>>>> through
>>>>>>>>>>> the code, and can't find any differences between the systems
>>>>>>>>>>> that
>>>>>> work and those that
>>>>>>>>>>> don't.
>>>>>>>>>>> Is there any way to debug this?  Is there a way to find out
>>>>>>>>>>> where the
>>>>>> loader is sitting
>>>>>>>>>>> waiting on input from the terminal.  That might give a clue as
>>>>>>>>>>> to why it
>>>>>> didn't
>>>>>>>>>>> autoboot.
>>>>>>>>>>> 
>>>>>>>>>> OK. This is the first I've seen of your post. I'm not going to
>>>>>>>>>> profess being an expert. But I might suggest adding the
>>>>>>>>>> following to
>>>>>>>>>> loader.conf(5)
>>>>>>>>>> 
>>>>>>>>>> verbose_loading="YES"
>>>>>>>>>> boot_verbose="YES"
>>>>>>>>>> 
>>>>>>>>>> This raises the "noise level". Maybe that will help to provide
>>>>>>>>>> you with a bit more information, as to what, or if, your
>>>>>>>>>> booting. DO have a look through /boot/defaults/loader.conf for
>>>>>>>>>> more hints, as to what, and
>>>>>> how
>>>>>>>>>> you can control the boot process. As well as
> /etc/defaults/rc.conf.
>>>>>>>>>> In fact, you can pre-decide what, and how, to boot. Even
>>>>>>>>>> passing by the boot menu entirely.
>>>>>>>>> 
>>>>>>>>> Thanks Chris.  I did that and here is what I get:
>>>>>>>>> 
>>>>>>>>> Rebooting...
>>>>>>>>> cpu_reset: Stopping other CPUs
>>>>>>>>> /boot.config: -Dh
>>>>>>>>> Consoles: internal video/keyboard  serial port BIOS drive A: is
>>>>>>>>> disk0 BIOS drive C: is disk1 BIOS 640kB/2087360kB available
>>>>>>>>> memory
>>>>>>>>> 
>>>>>>>>> FreeBSD/x86 bootstrap loader, Revision 1.1 (doug at zool.lafn.org,
>>>>>>>>> Tue Apr  8 20:30:20 PDT 2014) Loading /boot/defaults/loader.conf
>>>>>>>>> Warning: unable to open file /boot/loader.conf.local
>>>>>>>>> /boot/kernel/kernel text=0xdb3171 data=0xf3c04+0xbb770
>>>>>> syms=[0x4+0xeda80+0x4+0x1b8ebf]
>>>>>>>>> zpool_cache...failed!
>>>>>>>>> \
>>>>>>>>> H[Esc]ape to loader prompt_   _____ _____
>>>>>>>>> |  ____|             |  _ \ / ____|  __ \
>>>>>>>>> | |___ _ __ ___  ___ | |_) | (___ | |  | |
>>>>>>>>> |  ___| '__/ _ \/ _ \|  _ < \___ \| |  | |
>>>>>>>>> | |   | | |  __/  __/| |_) |____) | |__| |
>>>>>>>>> | |   | | |    |    ||     |      |      |
>>>>>>>>> |_|   |_|  \___|\___||____/|_____/|_____/    ```
> `
>>>>>>>>>                                         s` `.....---.......--.```
> -/
>>>>>>>>> +            Welcome to FreeBSD           + +o   .--`         /y:`
> +.
>>>>>>>>> |                                         |  yo`:.            :o
> `+-
>>>>>>>>> |  1. Boot Multi User [Enter]             |   y/        3;46H /
>>>>>>>>> |  2.--  /                                |
>>>>>>>>> |                                         |
>>>>>>>>> |  4. Reboot                              | `:
> :`
>>>>>>>>> |                                         | `:
> :`
>>>>>>>>> |  Options:                                  /
> /
>>>>>>>>> |  5. Configure Boot [O]ptions...            .-
> -.
>>>>>>>>> |                                             --
> -.
>>>>>>>>> |                                              `:`
> `:`
>>>>>>>>> |                                                .--
> `--.
>>>>>>>>> |                                                   .---.....----.
>>>>>>>>> +-----------------------------------------+
>>>>>>>>> 
>>>>>>>>>                                            FreeBSD `Nakatomi
>>>>>>>>> Socrates' 9.2
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Now it waits for a return.  I have tried changing the logo,
>>>>>>>>> setting the
>>>>>> autoboot timeout
>>>>>>>>> and
>>>>>>>>> a couple others.  The only thing that did anything different was
>>>>>>>>> setting
>>>>>> the logo to an
>>>>>>>>> invalid value.  Basically the console was dead after that, but
>>>>>>>>> the system
>>>>>> did boot.  I
>>>>>>>>> never
>>>>>>>>> see the Auto Boot in n seconds message.  Its also interesting
>>>>>>>>> that the list
>>>>>> of options
>>>>>>>>> above
>>>>>>>>> appears incomplete.  On the working system, items 1 through 5
>>>>>>>>> are all
>>>>>> present.  I have
>>>>>>>>> now
>>>>>>>>> checked all the cksum's for all the files in /boot and they are
> all the
>> same.
>>>>>>>>> 
>>>>>>>> Hmmm. Looks like you're going to make me do all your research, for
>> you.
>>>>>> ;)
>>>>>>>> You /did/ read the contents of /boot/defaults/loader.conf. Yes?
>>>>>>>> I'm
>>>>>> guessing
>>>>>>>> that you've also already read loader.4th(8), and the other related
>> info.
>>>>>>>> Now this is pure supposition; as it appears that you're looking
>>>>>>>> for a serial console. I'd /speculate/ that you want to turn all
>>>>>>>> that NASTY ANSI stuff
>>>>>> OFF
>>>>>>>> That's why your not seeing the complete menu -- hear that Devin!
>>>>>>>> I'm going to post just this much for now, just to get you
>>>>>>>> started. I know what else you need/are looking for. But need to
>>>>>>>> find the /correct/ syntax
>>>>>> --
>>>>>>>> paraphrasing, just won't get it. :)\
>>>>>>> 
>>>>>>> Setting loader_color="NO"   (from man page)  does give back the full
>> menu.
>>>>>> Still waits for
>>>>>>> return after the version name.  I haven't found in the forth where
>>>>>>> it is
>>>>>> reading the
>>>>>>> keyboard.  Yes, I have to use a serial console.  These machines
>>>>>>> are about
>>>>>> 100 miles away.
>>>>>>> Something is stopping the autoboot from even starting.
>>>>>> 
>>>>>> See my reply to this. I think I've given you the hints you need --
>>>>>> fingers crossed. :)
>>>>>> 
>>>>> 
>>>>> He's using console=comconsole (serial boot).
>>>>> When that is the case, loader_color is automatically set to NO.
>>>>> There's no reason to set both loader_color=NO and console=
>>>>> comconsole. The code that does this is here:
>>>>> 
>>>>> http://svnweb.freebsd.org/base/release/9.2.0/sys/boot/forth/color.4t
>>>>> h?revision=255898&view=markup Line 48 within the loader_color?
>>>>> function:
>>>>> 	boot_serial? if FALSE else TRUE then
>>>>> 
>>>>> As for answering the quandary of where the keyboard is polled during
>>>>> the timeout countdown, that's the getkey function in here:
>>>>> 
>>>>> 
>> http://svnweb.freebsd.org/base/release/9.2.0/sys/boot/forth/menu.4th
>>>>> ?revision=255898&view=markup
>>>>> --
>>>> 
>>>> 
>>>> 
>>>> I commented out the 3 cursor positions in menu-timeout-update.  It
>>>> does not appear that word is being used.  The Autoboot message never
>>>> appeared.  Obviously getkey is being used as it does respond properly
>>>> to a return.  I am beginning to suspect that menu_timeout_enabled is
>>>> zero.  I believe adding a line after getkey's begin with
>>>> 
>>>>      s"menu_timeout_enabled = " type menu_timeout_enabled @ . 10
>>>> spaces
>>>> 
>>>> will tell me.
>>> 
>>> 
>>> 
>>> There is a missing space after the first " above.  However, that does
> confirm
>> my suspicion that menu_timeout_enabled is set to 0.  It is only displayed
>> once.  On a working system the value is 1 and that message is output
>> numerous times until the 10 seconds expires and then the boot begins.
>>> 
>>> Now to figure out how that value is getting set incorrectly.
>>> 
>> 
>> After much digging, I now know what it going on, but not why.  When getkey
>> is called the first time, menu_timeout_enable is set to one.  However, it
> is
>> set to zero on every check after that.  In getkey after the comment "Was a
>> key pressed" is a check of key to see if a key was pressed.  It is
> returning a
>> decimal 7 (BEL).  That then clears menu_timeout_enable and it then sits
>> there waiting for a valid key input.  There is no keyboard plugged into
> the
>> system.  I have no idea how that BEL is being generated or even how to
>> prevent it.  Could it be possible that it comes from the serial console?
> I tend
>> to doubt thats the case since the system hangs during boot when the serial
>> console is not connected.  I suppose that I could put in a test for a key
> value
>> that is not a control character, but that would only work until the next
> system
>> update.  I'd have to remember to put it back in each time.  Thats not
> likely to
>> happen.  My memory is not that good.  Whats interesting is that I have 4
>> systems (i386) doing this and 1 system (i386) and 2 systems (amd64) not
>> doing it.  The only common thread is the 4 systems doing it are about 100
>> miles from me and the working ones are here.
>> 
> 
> Based on that feedback, I've developed the attached patch.txt.
> Can you give it a whirl and let me know how it works?

The patch works properly.  However, it the process of testing it, I discovered that the cause of the "bell" is actually the terminal emulator echoing that character back from something earlier in the reboot process.  Why that character is not understood.  Hence, the real problem lies in a hardware "failure" outside the motherboard.  So I don't know if you want to make that patch into the system or not.  It seems like a good idea to ignore anything thats a control character, or to clear out the input at the start of the process anyway.

In my case, I need the patch and will keep it in my systems.

Thanks for all the help.

-- Doug



More information about the freebsd-stable mailing list