consistent VM hang during reboot

Andrew Duane aduane at juniper.net
Thu May 8 18:42:36 UTC 2014


When I was doing some early work on some of the Octeon multi-core chips, I encountered something similar. If I remember correctly, there was an issue in the shutdown sequence that did not properly halt the cores and set up the "start jump" vector. So the first core would start, and when it tried to start the next ones it would hang waiting for the ACK that they were running (since they didn't have a start vector and hence never started). I know MIPS, not AMD, so I can't say what the equivalent would be, but I'm sure there is one. Check that part, setting up the early state.

If Juli and/or Adrian are reading this: do you remember anything about that, something like 2 years ago?

....................................
Andrew L. Duane
AT&T Technical Lead
JNCIA - JUNOS
m   +1 603.770.7088
o    +1 408.933.6944 (2-6944)
skype: andrewlduane
aduane at juniper.net


-----Original Message-----
From: owner-freebsd-hackers at freebsd.org [mailto:owner-freebsd-hackers at freebsd.org] On Behalf Of John Nielsen
Sent: Thursday, May 08, 2014 1:56 PM
To: John Baldwin
Cc: freebsd-hackers at freebsd.org; freebsd-virtualization at freebsd.org
Subject: Re: consistent VM hang during reboot

On May 8, 2014, at 11:03 AM, John Baldwin <jhb at freebsd.org> wrote:

> On Wednesday, May 07, 2014 7:15:43 pm John Nielsen wrote:
>> I am trying to solve a problem with amd64 FreeBSD virtual machines running on a Linux+KVM hypervisor. To be honest I'm not sure if the problem is in FreeBSD or 
> the hypervisor, but I'm trying to rule out the OS first.
>> 
>> The _second_ time FreeBSD boots in a virtual machine with more than one core, the boot hangs just before the kernel would normally print e.g. "SMP: AP CPU #1 
> Launched!" (The last line on the console is "usbus0: 12Mbps Full Speed USB v1.0", but the problem persists even without USB). The VM will boot fine a first time, 
> but running either "shutdown -r now" OR "reboot" will lead to a hung second boot. Stopping and starting the host qemu-kvm process is the only way to continue.
>> 
>> The problem seems to be triggered by something in the SMP portion of cpu_reset() (from sys/amd64/amd64/vm_machdep.c). If I hit the virtual "reset" button the next 
> boot is fine. If I have 'kern.smp.disabled="1"' set for the initial boot then subsequent boots are fine (but I can only use one CPU core, of course). However, if I 
> boot normally the first time then set 'kern.smp.disabled="1"' for the second (re)boot, the problem is triggered. Apparently something in the shutdown code is 
> "poisoning the well" for the next boot.
>> 
>> The problem is present in FreeBSD 8.4, 9.2, 10.0 and 11-CURRENT as of yesterday.
>> 
>> This (heavy-handed and wrong) patch (to HEAD) lets me avoid the issue:
>> 
>> --- sys/amd64/amd64/vm_machdep.c.orig	2014-05-07 13:19:07.400981580 -0600
>> +++ sys/amd64/amd64/vm_machdep.c	2014-05-07 17:02:52.416783795 -0600
>> @@ -593,7 +593,7 @@
>> void
>> cpu_reset()
>> {
>> -#ifdef SMP
>> +#if 0
>> 	cpuset_t map;
>> 	u_int cnt;
>> 
>> I've tried skipping or disabling smaller chunks of code within the #if block but haven't found a consistent winner yet.
>> 
>> I'm hoping the list will have suggestions on how I can further narrow down the problem, or theories on what might be going on.
> 
> Can you try forcing the reboot to occur on the BSP (via 'cpuset -l 0 reboot')
> or a non-BSP ('cpuset -l 1 reboot') to see if that has any effect?  It might
> not, but if it does it would help narrow down the code to consider.

Hello jhb, thanks for responding.

I tried your suggestion but unfortunately it does not make any difference. The reboot hangs regardless of which CPU I assign the command to.

Any other suggestions?

JN

_______________________________________________
freebsd-hackers at freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"


More information about the freebsd-virtualization mailing list