i386/122668: FreeBSD boot loader doesn't work on Dell R900 (+workaround)

Mike Hibler mike at flux.utah.edu
Fri Apr 11 19:10:01 UTC 2008


>Number:         122668
>Category:       i386
>Synopsis:       FreeBSD boot loader doesn't work on Dell R900 (+workaround)
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-i386
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Apr 11 19:10:01 UTC 2008
>Closed-Date:
>Last-Modified:
>Originator:     Mike Hibler
>Release:        6.2-RELEASE
>Organization:
University of Utah, Flux Research Group
>Environment:
N/A
>Description:
As far as I can tell, this isn't a bug in the BSD bootloader, rather it is
a bug in the Dell BIOS.  However, googling around I see that other people have
seen this problem and I have worked around it, so thought I would report it.

Note also that I am seeing this bug in the Emulab bootloader which is derived
from the FreeBSD 6.2-RELEASE version of the bootloader, but I believe that
the problem would be the same in the actual boot loader (based on the posts
I have seen). 

The symptom is that I try to boot over the net using a PXE (currdev="pxe0:")
and the loader complains that it "cannot load kernel".

The problem is that on this machine one of BIOS calls (int15/fn0x820) in
bios_getsmap (src/sys/boot/i386/libi386/biossmap.c) is returning more than
the 20 bytes of data it is supposed to--it appears to return the value 0x09
in the 21st byte (or 24th, I forget my little-endian lore).  As the data are
being read into a 20-byte static heap buffer, the result is that the following
variable gets clobbered.  In this case 'smap' is the buffer, and the following
BSS allocated region is 'smapbase':

static struct bios_smap smap;
static struct bios_smap *smapbase;

smapbase is the dynamically allocated area where the individual smap
entries are copied into via:

                bcopy(&smap, &smapbase[smaplen], sizeof(struct bios_smap));

What I see then is that the first couple of iterations of read-an-entry,
copy-to-buffer work fine, but then one call returns the extra data and
the low-order byte of smapbase gets changed to 0x09 from something like 0xb4.
The result is still a legit address so the bcopy goes without incident but
the smap entry data winds up getting bcopy()ed to an earlier address,
overwriting other malloc()ed memory.

In this case it is overwriting some entries in the 'environ' environment
linked list, corrupting the chain.  The result is that I no longer have
a "currdev" environment variable, and so the loader tries to load from
the default (hard drive) rather than the net.  Since there is nothing on
the hard drive, it cannot read loader.rc or boot.conf or ..., and ultimately
winds up trying to load "kernel" which fails with an error.

Note that there are two read-data loops in this function, and the problem
does occur in the first loop as well, but since smapbase has not yet been
initialized (i.e., no bcopy happens here) it does not matter.

Note also that one post I read mentioned that another BSD boots fine on
the machine.  That could be because in that BSD they are reading the data
directly into the smapbase buffer and not via a temporary smap buffer.
There what is getting clobbered with 0x09 is just a yet-to-be-filled,
later part of the smapbase buffer.

>How-To-Repeat:
Try booting from Dell R900
>Fix:
The work around is to (arbitrarily) pad the temporary smap buffer with
another 4 bytes.  I tried padding up to an extra 32 bytes, but never saw
more than the single overwrite, and that was always in the 0-3rd byte after.


>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-i386 mailing list