From nobody Fri Nov 07 14:59:11 2025 X-Original-To: dev-commits-src-branches@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4d32Jb67cBz6G3Br; Fri, 07 Nov 2025 14:59:11 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R12" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4d32Jb2SQLz3P41; Fri, 07 Nov 2025 14:59:11 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1762527551; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=DGPdD9IPwf7AeaOBJFo2e/atB2Dj4Xpl9b/3pxrA8GU=; b=KY6MMTti7r8ZFwliBdWP4GXXedbXlRjW5JcD+DuxIpx2tUH6+WjPpoNjiI95rqSw2uZlHy HiX4mBpdD8+K8flYjWY2h76ewrxzI2D6kuj+7ybR4t3A+TnwW87AI9udl7KzfW5v+FkgA6 XbcQ/efvz/dveYnilHdTuyPSKQd895TSXld8Ly4RrxnWXnDUOAwxE4vhMTf6qyJqBnXvIP Y79RtOZOW8lWFZqjPh0CaIJtEFea2RuruMwHzhdHydpg1MGfLp3zCpwP2xUl/vLkClExoP DfB5EQttJacKGhQSunL9yLF44lFAECHw5iUNfmc80/jHxd7J1dBpCWx+4e8pgQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1762527551; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=DGPdD9IPwf7AeaOBJFo2e/atB2Dj4Xpl9b/3pxrA8GU=; b=JJWdG2yQJVNpGk0wdNOOiSE4tSBDL2IOMCQu10g/H0HhmJoso6czY2Kls5LlgtNNNCLpwG RN0+B9wW1FqnbJCfC3gUAiAbWTHuUgWmgHe4b4UqodcNq+KY6JMjUDb+hDfseZg5hnc2UC Vqj9slTGV6pU9zmRV/qPkhGI9ZmPyIv3Y74WUHazRhUGGjUEcWMUggGvGQ23Pm8dusUOcG /jIFg7fby2S+ejn9PpAgcTauu2z6MqzV/A6gXLR4PAsxATiMZgGj7yBAze6oEvv0KyyBnA IGOiykZ5XDkVPRg8ngpCmCuDP4OBgbxLihoIbRtn+HcNQvge9wpJvm+bJ8q+QA== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1762527551; a=rsa-sha256; cv=none; b=q3EOF1z6ygBuE0qGDt3dp/3TymrA+8kT7bWzyNlJK+9K1jDPcNNz5CEGxF+KX3Ik7nz5Yg aVCT3jrPy0Xw9oL/ubmOp2mOqMJCVXQIBFJF0zVU+hgDPau9CaOloPVWtyTBoqPWcGytbn PIp3zxj3Kr5y3XoAjzWp6TYxC6CiyLnr7i6BeTOACgU8NP5cjGqesVNiJMWkaHgJeYPgR2 YoH02rnsLuHQqzOcN2Q8nbCGNlIq24cuC97PxvKdv4pLTrbxqnLBCr+NQKrrFhiYJBpLU1 rcTPGxYTmi7v67V5rEzMW+rLd8HAEPdJeFERjXMVxqr3UTDUfN+Sgl7iU5+cGA== ARC-Authentication-Results: i=1; mx1.freebsd.org; none Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4d32Jb1vndz13mm; Fri, 07 Nov 2025 14:59:11 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.18.1/8.18.1) with ESMTP id 5A7ExB8Y083978; Fri, 7 Nov 2025 14:59:11 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.18.1/8.18.1/Submit) id 5A7ExBJH083975; Fri, 7 Nov 2025 14:59:11 GMT (envelope-from git) Date: Fri, 7 Nov 2025 14:59:11 GMT Message-Id: <202511071459.5A7ExBJH083975@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org From: Mark Johnston Subject: git: 300a8977bcfd - stable/15 - vmm: Fix a deadlock between vm_smp_rendezvous() and vcpu_lock_all() List-Id: Commits to the stable branches of the FreeBSD src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-branches List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: dev-commits-src-branches@freebsd.org Sender: owner-dev-commits-src-branches@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: markj X-Git-Repository: src X-Git-Refname: refs/heads/stable/15 X-Git-Reftype: branch X-Git-Commit: 300a8977bcfd2f43bc6df81d9bdad6b79a740729 Auto-Submitted: auto-generated The branch stable/15 has been updated by markj: URL: https://cgit.FreeBSD.org/src/commit/?id=300a8977bcfd2f43bc6df81d9bdad6b79a740729 commit 300a8977bcfd2f43bc6df81d9bdad6b79a740729 Author: Mark Johnston AuthorDate: 2025-10-17 13:09:38 +0000 Commit: Mark Johnston CommitDate: 2025-11-07 14:10:36 +0000 vmm: Fix a deadlock between vm_smp_rendezvous() and vcpu_lock_all() vm_smp_rendezvous() invokes a callback on all vCPUs, blocking the initiator until all vCPUs have responded. vcpu_lock_all() blocks each vCPU by waiting for it to go idle and setting the vCPU state to frozen. These two operations can deadlock on each other, particularly when booting a Windows guest, when vcpu_lock_all() blocks waiting for a rendezvous initiator, and the initiator is blocked waiting for the vCPU thread which called vcpu_lock_all() to invoke the rendezvous callback. Implement vcpu_lock_all() in a way that avoids deadlocks with vm_smp_rendezvous(). In particular, when traversing vCPUs, invoke the rendezvous callback on the vCPU's behalf to help the initiator finish. We can only safely do so when the vCPU is IDLE or we have already locked it, otherwise we may be racing with the target vCPU thread. Thus: - Use an exclusive lock to serialize vcpu_lock_all() callers, which lets us lock vCPUs out of order without fear of deadlock with parallel vcpu_lock_all() callers. - If a rendezvous is pending, lock all idle vCPUs and invoke the callback on their behalf. If the vcpu_lock_all() caller is itself a vCPU thread, this will handle that thread. - Block waiting for all non-idle vCPUs to idle, or until one of them initiates a rendezvous, in which case we go back and invoke callbacks on behalf of already-locked vCPUs. Note that on !amd64 no changes are needed since there is no rendezvous mechanism, so there is a separate vcpu_set_state_all() for them based on the previous vcpu_lock_all(). These will be merged together once vcpu state handling is consolidated into sys/dev/vmm. Reviewed by: corvink (previous version) MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D52968 (cherry picked from commit f39768e52e513264da60add0ca2412bddda271ff) --- sys/amd64/include/vmm.h | 3 +- sys/amd64/vmm/vmm.c | 177 ++++++++++++++++++++++++++++++++++++++++-------- sys/arm64/include/vmm.h | 2 +- sys/arm64/vmm/vmm.c | 4 +- sys/dev/vmm/vmm_dev.c | 28 ++++++-- sys/riscv/include/vmm.h | 2 +- sys/riscv/vmm/vmm.c | 4 +- 7 files changed, 178 insertions(+), 42 deletions(-) diff --git a/sys/amd64/include/vmm.h b/sys/amd64/include/vmm.h index 66d8991d36e8..ad67510fecf3 100644 --- a/sys/amd64/include/vmm.h +++ b/sys/amd64/include/vmm.h @@ -237,7 +237,7 @@ extern u_int vm_maxcpu; /* maximum virtual cpus */ int vm_create(const char *name, struct vm **retvm); struct vcpu *vm_alloc_vcpu(struct vm *vm, int vcpuid); void vm_disable_vcpu_creation(struct vm *vm); -void vm_slock_vcpus(struct vm *vm); +void vm_lock_vcpus(struct vm *vm); void vm_unlock_vcpus(struct vm *vm); void vm_destroy(struct vm *vm); int vm_reinit(struct vm *vm); @@ -362,6 +362,7 @@ enum vcpu_state { }; int vcpu_set_state(struct vcpu *vcpu, enum vcpu_state state, bool from_idle); +int vcpu_set_state_all(struct vm *vm, enum vcpu_state state); enum vcpu_state vcpu_get_state(struct vcpu *vcpu, int *hostcpu); static int __inline diff --git a/sys/amd64/vmm/vmm.c b/sys/amd64/vmm/vmm.c index 7254ebed6097..473887240b9b 100644 --- a/sys/amd64/vmm/vmm.c +++ b/sys/amd64/vmm/vmm.c @@ -562,9 +562,9 @@ vm_alloc_vcpu(struct vm *vm, int vcpuid) } void -vm_slock_vcpus(struct vm *vm) +vm_lock_vcpus(struct vm *vm) { - sx_slock(&vm->vcpus_init_lock); + sx_xlock(&vm->vcpus_init_lock); } void @@ -991,6 +991,54 @@ save_guest_fpustate(struct vcpu *vcpu) static VMM_STAT(VCPU_IDLE_TICKS, "number of ticks vcpu was idle"); +/* + * Invoke the rendezvous function on the specified vcpu if applicable. Return + * true if the rendezvous is finished, false otherwise. + */ +static bool +vm_rendezvous(struct vcpu *vcpu) +{ + struct vm *vm = vcpu->vm; + int vcpuid; + + mtx_assert(&vcpu->vm->rendezvous_mtx, MA_OWNED); + KASSERT(vcpu->vm->rendezvous_func != NULL, + ("vm_rendezvous: no rendezvous pending")); + + /* 'rendezvous_req_cpus' must be a subset of 'active_cpus' */ + CPU_AND(&vm->rendezvous_req_cpus, &vm->rendezvous_req_cpus, + &vm->active_cpus); + + vcpuid = vcpu->vcpuid; + if (CPU_ISSET(vcpuid, &vm->rendezvous_req_cpus) && + !CPU_ISSET(vcpuid, &vm->rendezvous_done_cpus)) { + VMM_CTR0(vcpu, "Calling rendezvous func"); + (*vm->rendezvous_func)(vcpu, vm->rendezvous_arg); + CPU_SET(vcpuid, &vm->rendezvous_done_cpus); + } + if (CPU_CMP(&vm->rendezvous_req_cpus, + &vm->rendezvous_done_cpus) == 0) { + VMM_CTR0(vcpu, "Rendezvous completed"); + CPU_ZERO(&vm->rendezvous_req_cpus); + vm->rendezvous_func = NULL; + wakeup(&vm->rendezvous_func); + return (true); + } + return (false); +} + +static void +vcpu_wait_idle(struct vcpu *vcpu) +{ + KASSERT(vcpu->state != VCPU_IDLE, ("vcpu already idle")); + + vcpu->reqidle = 1; + vcpu_notify_event_locked(vcpu, false); + VMM_CTR1(vcpu, "vcpu state change from %s to " + "idle requested", vcpu_state2str(vcpu->state)); + msleep_spin(&vcpu->state, &vcpu->mtx, "vmstat", hz); +} + static int vcpu_set_state_locked(struct vcpu *vcpu, enum vcpu_state newstate, bool from_idle) @@ -1005,13 +1053,8 @@ vcpu_set_state_locked(struct vcpu *vcpu, enum vcpu_state newstate, * ioctl() operating on a vcpu at any point. */ if (from_idle) { - while (vcpu->state != VCPU_IDLE) { - vcpu->reqidle = 1; - vcpu_notify_event_locked(vcpu, false); - VMM_CTR1(vcpu, "vcpu state change from %s to " - "idle requested", vcpu_state2str(vcpu->state)); - msleep_spin(&vcpu->state, &vcpu->mtx, "vmstat", hz); - } + while (vcpu->state != VCPU_IDLE) + vcpu_wait_idle(vcpu); } else { KASSERT(vcpu->state != VCPU_IDLE, ("invalid transition from " "vcpu idle state")); @@ -1063,6 +1106,95 @@ vcpu_set_state_locked(struct vcpu *vcpu, enum vcpu_state newstate, return (0); } +/* + * Try to lock all of the vCPUs in the VM while taking care to avoid deadlocks + * with vm_smp_rendezvous(). + * + * The complexity here suggests that the rendezvous mechanism needs a rethink. + */ +int +vcpu_set_state_all(struct vm *vm, enum vcpu_state newstate) +{ + cpuset_t locked; + struct vcpu *vcpu; + int error, i; + uint16_t maxcpus; + + KASSERT(newstate != VCPU_IDLE, + ("vcpu_set_state_all: invalid target state %d", newstate)); + + error = 0; + CPU_ZERO(&locked); + maxcpus = vm->maxcpus; + + mtx_lock(&vm->rendezvous_mtx); +restart: + if (vm->rendezvous_func != NULL) { + /* + * If we have a pending rendezvous, then the initiator may be + * blocked waiting for other vCPUs to execute the callback. The + * current thread may be a vCPU thread so we must not block + * waiting for the initiator, otherwise we get a deadlock. + * Thus, execute the callback on behalf of any idle vCPUs. + */ + for (i = 0; i < maxcpus; i++) { + vcpu = vm_vcpu(vm, i); + if (vcpu == NULL) + continue; + vcpu_lock(vcpu); + if (vcpu->state == VCPU_IDLE) { + (void)vcpu_set_state_locked(vcpu, VCPU_FROZEN, + true); + CPU_SET(i, &locked); + } + if (CPU_ISSET(i, &locked)) { + /* + * We can safely execute the callback on this + * vCPU's behalf. + */ + vcpu_unlock(vcpu); + (void)vm_rendezvous(vcpu); + vcpu_lock(vcpu); + } + vcpu_unlock(vcpu); + } + } + + /* + * Now wait for remaining vCPUs to become idle. This may include the + * initiator of a rendezvous that is currently blocked on the rendezvous + * mutex. + */ + CPU_FOREACH_ISCLR(i, &locked) { + if (i >= maxcpus) + break; + vcpu = vm_vcpu(vm, i); + if (vcpu == NULL) + continue; + vcpu_lock(vcpu); + while (vcpu->state != VCPU_IDLE) { + mtx_unlock(&vm->rendezvous_mtx); + vcpu_wait_idle(vcpu); + vcpu_unlock(vcpu); + mtx_lock(&vm->rendezvous_mtx); + if (vm->rendezvous_func != NULL) + goto restart; + vcpu_lock(vcpu); + } + error = vcpu_set_state_locked(vcpu, newstate, true); + vcpu_unlock(vcpu); + if (error != 0) { + /* Roll back state changes. */ + CPU_FOREACH_ISSET(i, &locked) + (void)vcpu_set_state(vcpu, VCPU_IDLE, false); + break; + } + CPU_SET(i, &locked); + } + mtx_unlock(&vm->rendezvous_mtx); + return (error); +} + static void vcpu_require_state(struct vcpu *vcpu, enum vcpu_state newstate) { @@ -1084,36 +1216,23 @@ vcpu_require_state_locked(struct vcpu *vcpu, enum vcpu_state newstate) static int vm_handle_rendezvous(struct vcpu *vcpu) { - struct vm *vm = vcpu->vm; + struct vm *vm; struct thread *td; - int error, vcpuid; - error = 0; - vcpuid = vcpu->vcpuid; td = curthread; + vm = vcpu->vm; + mtx_lock(&vm->rendezvous_mtx); while (vm->rendezvous_func != NULL) { - /* 'rendezvous_req_cpus' must be a subset of 'active_cpus' */ - CPU_AND(&vm->rendezvous_req_cpus, &vm->rendezvous_req_cpus, &vm->active_cpus); - - if (CPU_ISSET(vcpuid, &vm->rendezvous_req_cpus) && - !CPU_ISSET(vcpuid, &vm->rendezvous_done_cpus)) { - VMM_CTR0(vcpu, "Calling rendezvous func"); - (*vm->rendezvous_func)(vcpu, vm->rendezvous_arg); - CPU_SET(vcpuid, &vm->rendezvous_done_cpus); - } - if (CPU_CMP(&vm->rendezvous_req_cpus, - &vm->rendezvous_done_cpus) == 0) { - VMM_CTR0(vcpu, "Rendezvous completed"); - CPU_ZERO(&vm->rendezvous_req_cpus); - vm->rendezvous_func = NULL; - wakeup(&vm->rendezvous_func); + if (vm_rendezvous(vcpu)) break; - } + VMM_CTR0(vcpu, "Wait for rendezvous completion"); mtx_sleep(&vm->rendezvous_func, &vm->rendezvous_mtx, 0, "vmrndv", hz); if (td_ast_pending(td, TDA_SUSPEND)) { + int error; + mtx_unlock(&vm->rendezvous_mtx); error = thread_check_susp(td, true); if (error != 0) diff --git a/sys/arm64/include/vmm.h b/sys/arm64/include/vmm.h index 84b286a60b38..696a69669a2a 100644 --- a/sys/arm64/include/vmm.h +++ b/sys/arm64/include/vmm.h @@ -177,7 +177,7 @@ DECLARE_VMMOPS_FUNC(int, restore_tsc, (void *vcpui, uint64_t now)); int vm_create(const char *name, struct vm **retvm); struct vcpu *vm_alloc_vcpu(struct vm *vm, int vcpuid); void vm_disable_vcpu_creation(struct vm *vm); -void vm_slock_vcpus(struct vm *vm); +void vm_lock_vcpus(struct vm *vm); void vm_unlock_vcpus(struct vm *vm); void vm_destroy(struct vm *vm); int vm_reinit(struct vm *vm); diff --git a/sys/arm64/vmm/vmm.c b/sys/arm64/vmm/vmm.c index 4f266de0cfb1..14ea26c3668c 100644 --- a/sys/arm64/vmm/vmm.c +++ b/sys/arm64/vmm/vmm.c @@ -469,9 +469,9 @@ vm_alloc_vcpu(struct vm *vm, int vcpuid) } void -vm_slock_vcpus(struct vm *vm) +vm_lock_vcpus(struct vm *vm) { - sx_slock(&vm->vcpus_init_lock); + sx_xlock(&vm->vcpus_init_lock); } void diff --git a/sys/dev/vmm/vmm_dev.c b/sys/dev/vmm/vmm_dev.c index 5b857a171e94..ebbceb25b69e 100644 --- a/sys/dev/vmm/vmm_dev.c +++ b/sys/dev/vmm/vmm_dev.c @@ -121,18 +121,18 @@ vcpu_unlock_one(struct vcpu *vcpu) vcpu_set_state(vcpu, VCPU_IDLE, false); } +#ifndef __amd64__ static int -vcpu_lock_all(struct vmmdev_softc *sc) +vcpu_set_state_all(struct vm *vm, enum vcpu_state newstate) { struct vcpu *vcpu; int error; uint16_t i, j, maxcpus; error = 0; - vm_slock_vcpus(sc->vm); - maxcpus = vm_get_maxcpus(sc->vm); + maxcpus = vm_get_maxcpus(vm); for (i = 0; i < maxcpus; i++) { - vcpu = vm_vcpu(sc->vm, i); + vcpu = vm_vcpu(vm, i); if (vcpu == NULL) continue; error = vcpu_lock_one(vcpu); @@ -142,16 +142,32 @@ vcpu_lock_all(struct vmmdev_softc *sc) if (error) { for (j = 0; j < i; j++) { - vcpu = vm_vcpu(sc->vm, j); + vcpu = vm_vcpu(vm, j); if (vcpu == NULL) continue; vcpu_unlock_one(vcpu); } - vm_unlock_vcpus(sc->vm); } return (error); } +#endif + +static int +vcpu_lock_all(struct vmmdev_softc *sc) +{ + int error; + + /* + * Serialize vcpu_lock_all() callers. Individual vCPUs are not locked + * in a consistent order so we need to serialize to avoid deadlocks. + */ + vm_lock_vcpus(sc->vm); + error = vcpu_set_state_all(sc->vm, VCPU_FROZEN); + if (error != 0) + vm_unlock_vcpus(sc->vm); + return (error); +} static void vcpu_unlock_all(struct vmmdev_softc *sc) diff --git a/sys/riscv/include/vmm.h b/sys/riscv/include/vmm.h index bc00474ed0fd..e227dd825966 100644 --- a/sys/riscv/include/vmm.h +++ b/sys/riscv/include/vmm.h @@ -149,7 +149,7 @@ DECLARE_VMMOPS_FUNC(void, vmspace_free, (struct vmspace *vmspace)); int vm_create(const char *name, struct vm **retvm); struct vcpu *vm_alloc_vcpu(struct vm *vm, int vcpuid); void vm_disable_vcpu_creation(struct vm *vm); -void vm_slock_vcpus(struct vm *vm); +void vm_lock_vcpus(struct vm *vm); void vm_unlock_vcpus(struct vm *vm); void vm_destroy(struct vm *vm); int vm_reinit(struct vm *vm); diff --git a/sys/riscv/vmm/vmm.c b/sys/riscv/vmm/vmm.c index 7944c3d7a214..a9eb9d144336 100644 --- a/sys/riscv/vmm/vmm.c +++ b/sys/riscv/vmm/vmm.c @@ -342,9 +342,9 @@ vm_alloc_vcpu(struct vm *vm, int vcpuid) } void -vm_slock_vcpus(struct vm *vm) +vm_lock_vcpus(struct vm *vm) { - sx_slock(&vm->vcpus_init_lock); + sx_xlock(&vm->vcpus_init_lock); } void