Re: bhyve 13.1 compatibility breakage

From: Mark Johnston <markj_at_freebsd.org>
Date: Wed, 08 Feb 2023 19:24:09 UTC
On Wed, Feb 08, 2023 at 11:08:31AM -0800, John Baldwin wrote:
> On 2/8/23 10:05 AM, Mark Johnston wrote:
> > On Sun, Jan 15, 2023 at 10:46:21AM -0500, Mark Johnston wrote:
> >> On Wed, Nov 23, 2022 at 08:00:16AM +0000, Corvin Köhne wrote:
> >>> The branch main has been updated by corvink:
> >>>
> >>> URL: https://cgit.FreeBSD.org/src/commit/?id=7c326ab5bb9aced8dcbc2465ac1c9ff8df2ba46b
> >>>
> >>> commit 7c326ab5bb9aced8dcbc2465ac1c9ff8df2ba46b
> >>> Author:     Corvin Köhne <corvink@FreeBSD.org>
> >>> AuthorDate: 2022-11-21 14:00:04 +0000
> >>> Commit:     Corvin Köhne <corvink@FreeBSD.org>
> >>> CommitDate: 2022-11-23 08:00:04 +0000
> >>>
> >>>      vmm: don't lock a mtx in the icr_low write handler
> >>>      
> >>>      x2apic accesses are handled by a wrmsr exit. This handler is called in a
> >>>      critical section. So, we can't lock a mtx in the icr_low handler.
> >>>      
> >>>      Reported by:            kp, pho
> >>>      Tested by:              kp, pho
> >>>      Approved by:            manu (mentor)
> >>>      Fixes:                  c0f35dbf19c3c8825bd2b321d8efd582807d1940 vmm: Use a cpuset_t for vCPUs waiting for STARTUP IPIs.
> >>>      MFC after:              1 week
> >>>      MFC with:               c0f35dbf19c3c8825bd2b321d8efd582807d1940
> >>>      Sponsored by:           Beckhoff Automation GmbH & Co. KG
> >>>      Differential Revision:  https://reviews.freebsd.org/D37452
> >>> ---
> >>
> >> Hi Corvin,
> >>
> >> This seems to break AP startup when using bhyve/libvmmapi from FreeBSD
> >> 13.1 on a kernel built from main.  It looks like the commit somehow
> >> regresses commit 769b884e2e2, but I'm not sure yet exactly how.
> > 
> > I debugged this further and am not quite sure how to fix the problem,
> > which isn't specific to this commit after all.  I'll try to describe it
> > here.
> > 
> > Suppose we're booting a VM with 2 vCPUs. When the BSP raises the INIT
> > IPI to start AP 1, vlapic_icrlo_write_handler() looks up the destination
> > vCPU with vlapic_calcdest(), which only returns active vCPUs.  However,
> > old bhyve executables activate APs (i.e., call vm_activate_cpu())
> > lazily, only upon receiving a VM_EXITCODE_SPINUP_AP message.  Thus,
> > vm_handle_ipi() simply doesn't doesn't do anything since "dmask" is
> > empty, so APs don't boot up.
> > 
> > To further complicate things, new vmm.ko allocates vCPUs lazily.  New
> > bhyve executables call vm_activate_cpu() for all vCPUs before running
> > the BSP, but as I said above, old bhyve executables do not.  So merely
> > fixing "dmask" in vlapic_icrlo_write_handler() to include
> > not-yet-activated vCPUs doesn't work, and we can't allocate a new vCPU
> > in that context.  In general it seems that we want an INIT IPI to
> > trigger allocation of a vcpu structure to preserve compatibility with
> > old bhyve, but I don't see a good way to implement that.
> > 
> > I would quite like to fix this since I make heavy use of 13.1-RELEASE
> > bhyve+jails on a kernel running main.  I believe bhyve from stable/13 is
> > unaffected, but 13.2 is not yet released.  Any suggestions would be
> > appreciated.
> 
> Hmm, I thought I had fixed this by using the bitmask of started CPUs
> rather than requiring the vCPU to be allocated.  I was definitely testing
> an old bhyve binary from head against the vCPU branch while working on it
> and remember hitting this exact case, but I thought I had fixed it.  Oh,
> hmm, my fix was the commit quoted above (769b884e2e2).

After looking further, I think this is actually not so painful to fix.
I have a patch which seems to work, but I need to test further.

Lazy allocation of the vcpu structure is mostly fine.  The only problem
is that the INIT handler in vm_handle_ipi() wants to reinitialize the
vlapic on all destination vCPUs and this doesn't work if any are not yet
allocated.  But if they're not yet allocated, then we can rely on a
later vm_alloc_vcpu() to initialize vlapic state, so it doesn't really
matter.

So if I change vlapic_calcdest() to report inactive destination CPUs for
INIT and STARTUP IPIs, and I change vm_handle_ipi() to only invoke
vlapic_handle_init() on active vCPUs, and I fix a small bug in this
commit (CPU_FFS() is 1-indexed not 0-indexed) then I can boot multicore
VMs using a 13.1-RELEASE bhyve again.