[Bug 222916] [bhyve] Debian guest kernel panics with message "CPU#0 stuck for Xs!"

Sat Oct 14 17:12:44 UTC 2017

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=222916

--- Comment #6 from Peter Grehan <grehan at FreeBSD.org> ---
>Or does the interaction between bhyve and the host scheduler somehow
>result in the virtual cpus being set aside

 Yes, though:

> for tens of seconds?

 The error message from Linux is a bit misleading. There is a low-priority
kernel thread that tries to run every 5 seconds and then sleeps. If it hasn't
been able to run for an extended amount of time, for example due to high
interrupt activity, higher priority threads running, or spinlocks being held,
the error message will be displayed.

 What I believe you are seeing is a classic hypervisor problem, not specific to
bhyve, known as "lock-holder preemption" where a vCPU holding a spin-lock is
preempted by the host, and other vCPUs that are running then spin attempting to
acquire that lock which can't be released. A search will show the large amount
of literature on this issue :)

 Maybe the best reading on this is the ESXi scheduler paper:

http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vmware-vsphere-cpu-sched-performance-white-paper.pdf

 There has been some talk of putting knowledge of vCPUs in the FreeBSD
scheduler to allow some form of gang scheduling, but nothing has come of that
so far.

 As to your point; it's more than just fairness that the hypervisor scheduler
has to provide - heuristics about guest o/s behaviour are also needed.

-- 
You are receiving this mail because:
You are the assignee for the bug.