kexec or similar for FreeBSD

Russell Cattelan cattelan at thebarn.com
Wed Nov 9 02:55:41 UTC 2011


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/8/11 4:10 PM, Andriy Gapon wrote:
> on 08/11/2011 23:14 Russell Cattelan said the following:
>> On 11/6/11 6:23 AM, Andriy Gapon wrote:
>>> on 24/10/2011 20:55 Russell Cattelan said the following:
>>>> So it has been a while and a lot of hair pulling but kload is
>>>> sorta alive and kicking. It can now load the kernel from
>>>> userspace, copy it over the running kernel and jump the the
>>>> kernel entry point.
>>>> 
>>>> I'm still having problems getting through the boot process
>>>> due to interrupts arriving for unconfigured handlers. Fatal
>>>> Trap (30)
>> 
>>> Just in case, is your original kernel running SMP?
>> 
>> I'm working on the SMP stuff now. Trying to get the processors in
>> a state where the restart process can complete.
>> 
>> For now I removed the panic call in the unknown interrupt case.
>> 
>> 
>> What I finally figured out was that starting up the system was
>> overwriting the page tables and caused any of AP's still looking
>> at those locations to cause qemu / kvm to reset  :-(
> 
> Very interesting. You might also find the following information
> useful in case you haven't implemented that yet:
> http://www.intel.com/design/pentium/datashts/242016.htm 
> specifically the Appendix B.5.  That is something that we are not
> doing right now, but what I would prefer us doing even for a
> "normal" warm reboot.
> 
> Namely: In order to do a complete system shutdown, followed by a
> warm restart if necessary, the operating system should return the
> system to a state similar to that at power-on. This includes
> disabling the Local APIC interrupts (LINT0/LINT1/Local APIC
> Timer/Error interrupt) on all processors, disabling the Local APIC
> on all APs and disabling all interrupts at all the I/O APICs in the
> system.
Ya I have been slowing figuring that out.
I have added a simple routine to tear down the ioapic handlers which
seems to be doing the right thing. I do not get the unhandled
interrupt message now.

Sending an IPI cpustop didn't quite do what I expected in that the cpu
is not really stop but just "pause"ed. So what ended up happening was
the cpus 1 + were still using the initial page table the from the
first boot, cpu 0 has a different page table set up by the kload
process. BUT when the boot process / cpu 0 started setting up the page
tables again in the same memory cpu 1+  was still referencing for
their page tables
qemu / kvm would reset and reboot the VM.

It took forever with lots of debug prints in both the kernel and qemu
to finally put the pieces together.

Changing the cpususpend routine to actually halt the cpu has finally
allowed the boot process to actually work using kload on a multi cpu
qemu vm.

Unfortunately it appears that VirtualBox does not handle things the
same and now panics when trying to start the AP's.

I'm guessing it has to do with exactly what you are saying and that
the local APICs need to be shutdown properly. The linux kexec
processes that.


> 
> I believe that this could be a reason for the spurious interrupts
> that you get. BTW, I am not completely sure, but it seems that we
> never disable the timer interrupt(s) during shutdown (unlike
> interrupts for all/most of other devices).
> 
> You might also find OpenSolaris code interesting in this respect: 
> http://fxr.watson.org/fxr/source/i86pc/io/pcplusmp/apic_common.c?v=OPENSOLARIS#L1160
>
> 
http://fxr.watson.org/fxr/source/i86pc/os/machdep.c?v=OPENSOLARIS#L191
Ahh good idea ... I've been trying to make sense of the linux apic
code to see how to duplicate the functionality but more examples are
always helpful.


> 
> All the best!

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk6566gACgkQNRmM+OaGhBikggCfZMob4rbk9SQT+YGXksilCmpA
ZnIAnjXyEa2uTVhYNP3SHMCpvWBPxCoP
=pDTQ
-----END PGP SIGNATURE-----


More information about the freebsd-hackers mailing list