Re: bhyve -D not cleaning up after itself

From: Mark Johnston <markj_at_freebsd.org>
Date: Mon, 29 Nov 2021 21:19:59 UTC
On Sat, Nov 27, 2021 at 02:40:57AM -0500, David E. Cross wrote:
> I have noticed for awhile that bhyve -D doesn't seem to actually do what 
> is claimed  (to destroy a VM on guest initiated power-off).  This 
> evening I decided to ktrace it to see if I was just not getting 
> something about how this was supposed to work, and found:
> 
> 
>   68613 vcpu 0   CALL 
> __sysctlbyname(0x1ebcdb20a133,0xe,0,0,0x1ebce4ba60f0,0x9)
>   68613 vcpu 0   SCTL "hw.vmm.destroy"
>   68613 vcpu 0   RET   __sysctlbyname -1 errno 1 Operation not permitted
>   68613 vcpu 0   CALL  exit(0x1)
> 
> 
> Reading quickly the kernel code for vm_destroy(), I find 2 candidates:
> 
> static int
> vmm_priv_check(struct ucred *ucred)
> {
> 
>          if (jailed(ucred) &&
>              !(ucred->cr_prison->pr_allow & pr_allow_flag))
>                  return (EPERM);
> 
>          return (0);
> }
> 
> This doesn't seem to be it, my process is not jailed.
> 
> That leads to the only other (I think) call in sysctl_vmm_destroy that 
> could return EPERM:
> 
> error = sysctl_handle_string(oidp, buf, buflen, req);
> 
> 
> But I am just not seeing it.  Also this EXACT same call works from the 
> context of bhyvectl --vm=FOO --destroy, run from the same shell as the 
> bhyve process that just terminated.  Is the 'ctx' somehow incorrect in 
> bhyve?  I is used earlier in that function, so I am assuming it is right?

The problem is that bhyve runs in capability mode (see capiscum(4)),
which restricts access to the sysctl namespace.  In particular, most
sysctls are not accessible, including hw.vmm.destroy, so -D is
effectively broken.

One possible solution is to spawn an unsandboxed helper process which
can toggle the sysctl on bhyve's behalf.  That is a rather heavyweight
solution, though.

Earlier this year some work was done on using a file descriptor-based
interface to create and destroy VMs, moving away from the old
sysctl-based interface.  It's stalled at the moment but I hope to return
to that work quite soon.  That should also help fix the problem but will
take some time to complete.

I think it may be easiest to simply allow writes to the sysctl for the
time being: https://reviews.freebsd.org/D33169