amd64/111955: [install] Install CD boot panic due to missing BIOS smap on 5.5 through to 7.0-Current Snapshot 200704

Peter Wemm peter at wemm.org
Sat Mar 8 03:59:53 UTC 2008


On Fri, Mar 7, 2008 at 3:40 PM, Bob Johnson <bob89 at eng.ufl.edu> wrote:
> The following reply was made to PR amd64/111955; it has been noted by GNATS.
>
>  From: Bob Johnson <bob89 at eng.ufl.edu>
>  To: bug-followup at freebsd.org,
>   Eamon Roque <Roque at itg.uni-muenchen.de>
>  Cc:
>  Subject: Re: amd64/111955: [install] Install CD boot panic due to missing BIOS smap on 5.5 through to 7.0-Current Snapshot 200704
>  Date: Fri, 7 Mar 2008 17:48:41 -0500
>
>   "FreeBSD only calls the BIOS SMAP call from virtual 86 mode both
>   in the loader and in the i386 kernel. The fix is quite complicated and
>   involves rewriting the boot code to invoke BIOS calls from real mode
>   rather than virtual 86 mode."
>
>   ?? but FreeBSD i386 boots and runs fine on an HP dc7700 that gives the "No
>   BIOS SMAP" error when booting AMD64. I'm completely ignorant of the boot
>   process for AMD64, but could code be lifted from i386 and moved to AMD64 to
>   solve this?

Here's what actually happens and explains the differences.

On the i386 kernel, we can make bios calls in vm86 mode during startup
and have various code to find memory the "old" ways, using
increasingly poor alternatives.  It can fall back to bios calls and
memory locations that have limits of 512MB or 64MB of ram, etc.

The amd64 kernel cannot make vm86 mode calls or bios calls.  It is the
nature of the cpu mode. In theory, the kernel could have a mini-32-bit
sub-kernel inside it and switch between 64 bit mode and 32 bit mode on
the fly in order to make vm86 calls, but that is a lot of work.

The AMD64 certification specs explicitly listed certain minimum bios
specs as part of the logo certification requirements.  For example,
they must be PC2001 at a minimum.  This means that it has to have USB,
ACPI, etc etc.  It has to have the 0xe820 memory map bios function
which completely specifies the memory layout in an ACPI-compliant
fashion.  It lists memory that is reserved for ACPI, etc.  Windows
logo certifications also require PC2001 or later these days as well.

For all intents and purposes, there is never going to be
amd64-compatible system that doesn't have at least this level of
functionaility.

When I was doing the amd64 kernel boot code, I was faced with all the
VM86 nastiness in the kernel.  I had to do it another way.  I realized
that since the loader was already getting the memory map itself, and
since it was running purely in 32 bit mode, then it made sense to
simply pass the bios smap data through to the kernel that the loader
already had.

But here's where it went horribly wrong.  Over recent years, bios
makers have put more and more hacks into the bios code.  The bioses
themselves sometimes switch from 16 bit real mode to 32 bit protected
mode and then back again.  They do this to emulate things like driver
floppies, usb and cdrom boot, etc etc.  The frequency of this is
increasing rather than decreasing.

And here's the rub.  If we call a bios function in vm86 mode, the bios
code *CANNOT* switch to 32 bit protected mode.  Usually what we see is
that you get a BTX fault.  This is because vm86 trapped an illegal or
priviliged instruction, and BTX reports the problem.  We've seen bios
vendors start to put code that TESTS to see if it is being called in
vm86 mode, and either silently fail or return an error, rather than
cause btx crashes etc.

Here's the rub.  Some bios vendors decided that the 0xe820 call needed
this treatment.  This is the bios SMAP call.  When the loader calls
the memory map functions via a vm86 bios call, the bios returns an
error.  The loader then falls back to the ancient bios calls and limps
along.  Of course, we can't pass the non-existing SMAP code to the
kernel, so when the kernel starts, it panics.

There is work afoot that solves this.  There should also be more seatbelts.

First and foremost, John has done a non-vm86 version of btx.  This
completely and utterly solves the root cause of the problem.  int 15
function 0xe820 will get called in real mode, just like windows,
linux, grub, netbsd, old freebsd bootblocks etc do.  Our boot code
will behave just like everybody else's and we won't have these strange
freebsd-specific problems anymore.

(The downside of this change is that bad bios code won't cause BTX
faults anymore.  Bios crashes will reset the machine instead of
reporting a btx fault that we can debug)

Secondly, loader should report the missing SMAP data before starting
an amd64 kernel.  I've been meaning to do this for a while.  if there
is no SMAP data, explain the problem right there rather than letting
the kernel blow up.

Third..  we might be able to generate a fake SMAP table in a sort of
limp-along mode.  eg: if the bios doens't have it, use the legacy
memory sizing code in loader to generate a fake table and pass that
through to the kernel.  The kernel might be stuck with only 512MB, but
it might be better than nothing.  It might be possible to use some
getenv type calls to override the limited data.

I think the work John has done to change the bios call method in BTX
is the right solution though.  I don't know what the MFC potential is.
 If it doesn't get backported, then the other hacks / workarounds /
seatbelts might be in order for older branches.

-- 
Peter Wemm - peter at wemm.org; peter at FreeBSD.org; peter at yahoo-inc.com
"All of this is for nothing if we don't go to the stars" - JMS/B5
"If Java had true garbage collection, most programs would delete
themselves upon execution." -- Robert Sewell


More information about the freebsd-amd64 mailing list