vbox + gpxe + pxeboot = fail

Lawrence Stewart lstewart at freebsd.org
Sat Jul 10 07:04:37 UTC 2010


On 07/10/10 15:26, Lawrence Stewart wrote:
> Hi All,
> 
> I had some frustration trying to get FreeBSD to pxeboot inside a vbox VM
> a while back. The thread is available here:
> 
> http://lists.freebsd.org/pipermail/freebsd-emulation/2010-April/007681.html
> 
> I left things for a while and came back to them yesterday with some
> fresh resolve to nut the problem out. I have some new insights I wanted
> to share.
> 
> I'm using gpxe 1.0.1 from
> http://kernel.org/pub/software/utils/boot/gpxe/ and doing the builds on
> a Debian VM. To create a rom for the vbox AMD adapter types, I'm
> following the details at:
> 
> http://www.etherboot.org/wiki/romburning/vbox
> 
> I turned the instructions to pad the rom into a python script you can
> grab from here:
> 
> http://people.freebsd.org/~lstewart/misc/vbox/rompad.py
> 
> 
> 
> Here's what I've figured out so far:
> 
> - The problem stems from the pxe boot rom environment provided by gpxe.
> It sends and receives packets correctly, but somehow the IP addresses
> get mangled (I think this happens inside gpxe) so it thinks the replies
> it is waiting for should be coming in on one IP address when they
> actually arrive on the real valid IP address.
> 
> - Using the binary only vbox on Win XP which uses the Intel pxe boot rom
> has no problems and works perfectly i.e. further evidence this is
> isolated to gpxe
> 
> - By changing the line
> "if (udpread_p->status > 0) {"
> to
> "if (udpread_p->status > 1) {"
> in sys/boot/i386/libi386/pxe.c, our pxeboot is able to work around the
> problem and I can pxeboot FreeBSD just fine. gpxe therefore is correctly
> reading the packets off the wire and passing them to our pxeboot code.
> gpxe just sets the failure status code because it thinks the packet is
> not the one we were waiting for because of the IP address being mismatched.
> 
> 
> 
> The file in the gpxe distribution that I've been adding debug printf's
> to is: src/arch/i386/interface/pxe/pxe_udp.c
> By doing a "%s/DBG/dbg_printf/g" in that file, you get debugging output
> that shows you the failures and the IP address it thinks the pkt should
> be coming in on. In my case, it correctly sends UDP packets to
> 172.16.7.21, and then waits for the reply on 172.16.7.50 (but sees that
> the reply actually comes in on 172.16.7.43 which is the IP of the VM).
> Because .43 != .50, gpxe returns status failure (i.e. 1) but does still
> correctly read the pkt and pass it to our pxeboot hence why my hack of
> ignoring the status actually allows things to work.

Some more useful info...

gpxe has the correct IP address at the beginning when printed from
"pxenv_udp_open()". Then in FreeBSD's pxe_open() in
sys/boot/i386/libi386/pxe.c, the "if (rootip.s_addr == 0) {" statement
evaluates true so we do another DHCP exchange by doing "bootp(pxe_sock,
BOOTP_PXE);" in order to try and get our own copy of the various DHCP
variables required to NFS boot.

I think it's this second DHCP exchange that is somehow wiping gpxe's
concept of the local IP as gpxe printfs done after the FreeBSD
pxe_open() call return the bogus local IP. Sure enough, if I comment out
the bootp() call and hardwire the rootip and rootpath, everything works
as expected.

The reason "rootip.s_addr == 0" appears to be because the
"pxe_call(PXENV_GET_CACHED_INFO);" in our pxeboot's pxe_init() fails to
pull cached values from gpxe. I tried upping PXE_BUFFER_SIZE in case we
weren't supplying a large enough buffer to receive all the cached data
but that didn't make any difference.

So it appears there are two separate issues in our pxeboot's interaction
with gpxe... first is we don't successfully manage to extract cached
DHCP data from gpxe which causes us to do our own DHCP exchange in
pxe_open() to get the info. Secondly, by doing the extra DHCP exchange,
we're somehow trampling gpxe's concept of what the local IP is.

Cheers,
Lawrence


More information about the freebsd-emulation mailing list