From nobody Thu Jan 26 22:12:07 2023 X-Original-To: dev-commits-src-branches@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4P2w141yXTz3c8jX; Thu, 26 Jan 2023 22:12:08 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4P2w135NJZz3wYX; Thu, 26 Jan 2023 22:12:07 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1674771127; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=WYksqmSvzDc5QIkDjYWSdwRVrHzFX/zfl5FcUVNnnkg=; b=h84D8kk7eFrApeEr6eeC3e/zrk9ZNxE9e/Lk+xXjZmvBdWzygU5Ws1IZe0vHdPfnt6Z3mp rP3pP9uurV6hMYJTIoAubrLS4siLz4KFcbOhpX/Y06IWApBj6jmfmRXxG8dWUCl79Tcx3U OVs6/Ix/lTTBITiVlk9a0Ccb8LVa/gRmu851PRC9I4qcHote7PFn7gMQH18bHUL0PhHg8i omU1DEhmK+z6H1nuF+Wut3STtIXz5tcP7QNVcWEvnBJYzVR4vPxINcAbMq10iPcxCsyCqB urPcKXKQmyMjNsVQfBICkf0HpDmRfCQFJupzkiIVKODhseU7PAnUTIjwQXYsRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1674771127; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=WYksqmSvzDc5QIkDjYWSdwRVrHzFX/zfl5FcUVNnnkg=; b=g0AyUeBB4AB5haaur1jsWN6+WT6yCMZLiZr9pAtW4ZGsaooupyy7TYLWPWzHmByPzYROsc hqIJg69+9mklOKal+tD74IS95JCWh5wfcuU9+9+y6act3SBKYHjRCP2NK2lCG52YdyPCkQ ylrvXl3XWbHZ3mVN+zybhwMxofvE/gLvslF3Zxa6ToZCHhjJZeSxN3iC6CKyqlYX1Cbvpk axXqJ6ljVJRdqmy02zLA87yYcpOhjOa6Ka1X4Zgn17EygBzE/qcNGwWccUbqgiY9PvX8sN LDY6znaNamo5Pb9Te8pFBhiDiDkS8MJl3fxaawtXn/U2nlfSoKWQ/ZyB9qJG8w== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1674771127; a=rsa-sha256; cv=none; b=fDFNBF4IdNzqfBXD7Pou7iPROYB/H4QcvyE9N+oyV7U3J1ZtqueY79l0Ks82e3vDFN5QC1 5V+keYXFlpwBpVCJhAeBTScFy6tmF9UP6WfdPn2/pBUknZ6/bDnuZBeaHrPT2p4SPMU4dx nWAf69coSRQpi2J6LMsT8MgQXpm5sM0FBBZBTrde9owhRDsJ1oi3/1B3CBkmUFdlIOyuPq eBn8EVWO9SbjVuplz6v+YpzvdOk3l5aOWarMZDJi14SiRgfS++gC5UZ/uNv+xhQ9u0wXJ0 KBhr8cEzfLEqLF/AVg4RZ9FCh/gUxKB6SWLnlTgK+q+k8dzzqQ1MkEVeaejkig== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4P2w134TZ6zlhD; Thu, 26 Jan 2023 22:12:07 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 30QMC7eO022259; Thu, 26 Jan 2023 22:12:07 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 30QMC7iN022258; Thu, 26 Jan 2023 22:12:07 GMT (envelope-from git) Date: Thu, 26 Jan 2023 22:12:07 GMT Message-Id: <202301262212.30QMC7iN022258@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org From: John Baldwin Subject: git: aca0d0b421d3 - stable/13 - vmm: Use an sx lock to protect the memory map. List-Id: Commits to the stable branches of the FreeBSD src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-branches List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-branches@freebsd.org X-BeenThere: dev-commits-src-branches@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: jhb X-Git-Repository: src X-Git-Refname: refs/heads/stable/13 X-Git-Reftype: branch X-Git-Commit: aca0d0b421d3ee7b844edf949902c0c189279ad7 Auto-Submitted: auto-generated X-ThisMailContainsUnwantedMimeParts: N The branch stable/13 has been updated by jhb: URL: https://cgit.FreeBSD.org/src/commit/?id=aca0d0b421d3ee7b844edf949902c0c189279ad7 commit aca0d0b421d3ee7b844edf949902c0c189279ad7 Author: John Baldwin AuthorDate: 2022-11-18 18:04:37 +0000 Commit: John Baldwin CommitDate: 2023-01-26 22:05:02 +0000 vmm: Use an sx lock to protect the memory map. Previously bhyve obtained a "read lock" on the memory map for ioctls needing to read the map by locking the last vCPU. This is now replaced by a new per-VM sx lock. Modifying the map requires exclusively locking the sx lock as well as locking all existing vCPUs. Reading the map requires either locking one vCPU or the sx lock. This permits safely modifying or querying the memory map while some vCPUs do not exist which will be true in a future commit. Reviewed by: corvink, markj Differential Revision: https://reviews.freebsd.org/D37172 (cherry picked from commit 67b69e76e8eecfd204f6de636d622a1d681c8d7e) --- sys/amd64/include/vmm.h | 3 +++ sys/amd64/vmm/vmm.c | 55 +++++++++++++++++++++++++++-------------- sys/amd64/vmm/vmm_dev.c | 66 ++++++++++++++++++++++++------------------------- 3 files changed, 72 insertions(+), 52 deletions(-) diff --git a/sys/amd64/include/vmm.h b/sys/amd64/include/vmm.h index 7c346d7ed4ed..b7506faaee92 100644 --- a/sys/amd64/include/vmm.h +++ b/sys/amd64/include/vmm.h @@ -231,6 +231,9 @@ int vm_set_topology(struct vm *vm, uint16_t sockets, uint16_t cores, /* * APIs that modify the guest memory map require all vcpus to be frozen. */ +void vm_slock_memsegs(struct vm *vm); +void vm_xlock_memsegs(struct vm *vm); +void vm_unlock_memsegs(struct vm *vm); int vm_mmap_memseg(struct vm *vm, vm_paddr_t gpa, int segid, vm_ooffset_t off, size_t len, int prot, int flags); int vm_munmap_memseg(struct vm *vm, vm_paddr_t gpa, size_t len); diff --git a/sys/amd64/vmm/vmm.c b/sys/amd64/vmm/vmm.c index 4659495ca5a2..87f1d9e45d58 100644 --- a/sys/amd64/vmm/vmm.c +++ b/sys/amd64/vmm/vmm.c @@ -46,6 +46,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #include #include @@ -155,6 +156,11 @@ struct mem_map { * (o) initialized the first time the VM is created * (i) initialized when VM is created and when it is reinitialized * (x) initialized before use + * + * Locking: + * [m] mem_segs_lock + * [r] rendezvous_mtx + * [v] reads require one frozen vcpu, writes require freezing all vcpus */ struct vm { void *cookie; /* (i) cpu-specific data */ @@ -170,13 +176,13 @@ struct vm { int suspend; /* (i) stop VM execution */ volatile cpuset_t suspended_cpus; /* (i) suspended vcpus */ volatile cpuset_t halted_cpus; /* (x) cpus in a hard halt */ - cpuset_t rendezvous_req_cpus; /* (x) rendezvous requested */ - cpuset_t rendezvous_done_cpus; /* (x) rendezvous finished */ - void *rendezvous_arg; /* (x) rendezvous func/arg */ + cpuset_t rendezvous_req_cpus; /* (x) [r] rendezvous requested */ + cpuset_t rendezvous_done_cpus; /* (x) [r] rendezvous finished */ + void *rendezvous_arg; /* (x) [r] rendezvous func/arg */ vm_rendezvous_func_t rendezvous_func; struct mtx rendezvous_mtx; /* (o) rendezvous lock */ - struct mem_map mem_maps[VM_MAX_MEMMAPS]; /* (i) guest address space */ - struct mem_seg mem_segs[VM_MAX_MEMSEGS]; /* (o) guest memory regions */ + struct mem_map mem_maps[VM_MAX_MEMMAPS]; /* (i) [m+v] guest address space */ + struct mem_seg mem_segs[VM_MAX_MEMSEGS]; /* (o) [m+v] guest memory regions */ struct vmspace *vmspace; /* (o) guest's address space */ char name[VM_MAX_NAMELEN+1]; /* (o) virtual machine name */ struct vcpu vcpu[VM_MAXCPU]; /* (i) guest vcpus */ @@ -185,6 +191,7 @@ struct vm { uint16_t cores; /* (o) num of cores/socket */ uint16_t threads; /* (o) num of threads/core */ uint16_t maxcpus; /* (o) max pluggable cpus */ + struct sx mem_segs_lock; /* (o) */ }; #define VMM_CTR0(vcpu, format) \ @@ -518,6 +525,7 @@ vm_create(const char *name, struct vm **retvm) strcpy(vm->name, name); vm->vmspace = vmspace; mtx_init(&vm->rendezvous_mtx, "vm rendezvous lock", 0, MTX_DEF); + sx_init(&vm->mem_segs_lock, "vm mem_segs"); vm->sockets = 1; vm->cores = cores_per_package; /* XXX backwards compatibility */ @@ -609,6 +617,7 @@ vm_cleanup(struct vm *vm, bool destroy) vmmops_vmspace_free(vm->vmspace); vm->vmspace = NULL; + sx_destroy(&vm->mem_segs_lock); mtx_destroy(&vm->rendezvous_mtx); } } @@ -645,6 +654,24 @@ vm_name(struct vm *vm) return (vm->name); } +void +vm_slock_memsegs(struct vm *vm) +{ + sx_slock(&vm->mem_segs_lock); +} + +void +vm_xlock_memsegs(struct vm *vm) +{ + sx_xlock(&vm->mem_segs_lock); +} + +void +vm_unlock_memsegs(struct vm *vm) +{ + sx_unlock(&vm->mem_segs_lock); +} + int vm_map_mmio(struct vm *vm, vm_paddr_t gpa, size_t len, vm_paddr_t hpa) { @@ -702,6 +729,8 @@ vm_alloc_memseg(struct vm *vm, int ident, size_t len, bool sysmem) struct mem_seg *seg; vm_object_t obj; + sx_assert(&vm->mem_segs_lock, SX_XLOCKED); + if (ident < 0 || ident >= VM_MAX_MEMSEGS) return (EINVAL); @@ -732,6 +761,8 @@ vm_get_memseg(struct vm *vm, int ident, size_t *len, bool *sysmem, { struct mem_seg *seg; + sx_assert(&vm->mem_segs_lock, SX_LOCKED); + if (ident < 0 || ident >= VM_MAX_MEMSEGS) return (EINVAL); @@ -1075,19 +1106,7 @@ void * vm_gpa_hold_global(struct vm *vm, vm_paddr_t gpa, size_t len, int reqprot, void **cookie) { -#ifdef INVARIANTS - /* - * All vcpus are frozen by ioctls that modify the memory map - * (e.g. VM_MMAP_MEMSEG). Therefore 'vm->memmap[]' stability is - * guaranteed if at least one vcpu is in the VCPU_FROZEN state. - */ - int state; - for (int i = 0; i < vm->maxcpus; i++) { - state = vcpu_get_state(vm_vcpu(vm, i), NULL); - KASSERT(state == VCPU_FROZEN, ("%s: invalid vcpu state %d", - __func__, state)); - } -#endif + sx_assert(&vm->mem_segs_lock, SX_LOCKED); return (_vm_gpa_hold(vm, gpa, len, reqprot, cookie)); } diff --git a/sys/amd64/vmm/vmm_dev.c b/sys/amd64/vmm/vmm_dev.c index 249131b16464..1b8b1e6d388f 100644 --- a/sys/amd64/vmm/vmm_dev.c +++ b/sys/amd64/vmm/vmm_dev.c @@ -221,8 +221,6 @@ vmmdev_rw(struct cdev *cdev, struct uio *uio, int flags) vm_paddr_t gpa, maxaddr; void *hpa, *cookie; struct vmmdev_softc *sc; - struct vcpu *vcpu; - uint16_t lastcpu; error = vmm_priv_check(curthread->td_ucred); if (error) @@ -233,13 +231,9 @@ vmmdev_rw(struct cdev *cdev, struct uio *uio, int flags) return (ENXIO); /* - * Get a read lock on the guest memory map by freezing any vcpu. + * Get a read lock on the guest memory map. */ - lastcpu = vm_get_maxcpus(sc->vm) - 1; - error = vcpu_lock_one(sc, lastcpu); - if (error) - return (error); - vcpu = vm_vcpu(sc->vm, lastcpu); + vm_slock_memsegs(sc->vm); prot = (uio->uio_rw == UIO_WRITE ? VM_PROT_WRITE : VM_PROT_READ); maxaddr = vmm_sysmem_maxaddr(sc->vm); @@ -256,7 +250,7 @@ vmmdev_rw(struct cdev *cdev, struct uio *uio, int flags) * Since this device does not support lseek(2), dd(1) will * read(2) blocks of data to simulate the lseek(2). */ - hpa = vm_gpa_hold(vcpu, gpa, c, prot, &cookie); + hpa = vm_gpa_hold_global(sc->vm, gpa, c, prot, &cookie); if (hpa == NULL) { if (uio->uio_rw == UIO_READ && gpa < maxaddr) error = uiomove(__DECONST(void *, zero_region), @@ -268,7 +262,7 @@ vmmdev_rw(struct cdev *cdev, struct uio *uio, int flags) vm_gpa_release(cookie); } } - vcpu_unlock_one(sc, lastcpu); + vm_unlock_memsegs(sc->vm); return (error); } @@ -420,6 +414,7 @@ vmmdev_ioctl(struct cdev *cdev, u_long cmd, caddr_t data, int fflag, struct vm_readwrite_kernemu_device *kernemu; uint64_t *regvals; int *regnums; + bool memsegs_locked; #ifdef BHYVE_SNAPSHOT struct vm_snapshot_meta *snapshot_meta; #endif @@ -435,6 +430,7 @@ vmmdev_ioctl(struct cdev *cdev, u_long cmd, caddr_t data, int fflag, vcpuid = -1; vcpu = NULL; state_changed = 0; + memsegs_locked = false; /* * For VMM ioctls that operate on a single vCPU, lookup the @@ -476,10 +472,6 @@ vmmdev_ioctl(struct cdev *cdev, u_long cmd, caddr_t data, int fflag, vcpu = vm_vcpu(sc->vm, vcpuid); break; - case VM_MAP_PPTDEV_MMIO: - case VM_UNMAP_PPTDEV_MMIO: - case VM_BIND_PPTDEV: - case VM_UNBIND_PPTDEV: #ifdef COMPAT_FREEBSD12 case VM_ALLOC_MEMSEG_FBSD12: #endif @@ -487,6 +479,21 @@ vmmdev_ioctl(struct cdev *cdev, u_long cmd, caddr_t data, int fflag, case VM_MMAP_MEMSEG: case VM_MUNMAP_MEMSEG: case VM_REINIT: + /* + * ioctls that modify the memory map must lock memory + * segments exclusively. + */ + vm_xlock_memsegs(sc->vm); + memsegs_locked = true; + /* FALLTHROUGH */ + case VM_MAP_PPTDEV_MMIO: + case VM_UNMAP_PPTDEV_MMIO: + case VM_BIND_PPTDEV: + case VM_UNBIND_PPTDEV: +#ifdef BHYVE_SNAPSHOT + case VM_SNAPSHOT_REQ: + case VM_RESTORE_TIME: +#endif /* * ioctls that operate on the entire virtual machine must * prevent all vcpus from running. @@ -503,14 +510,10 @@ vmmdev_ioctl(struct cdev *cdev, u_long cmd, caddr_t data, int fflag, case VM_GET_MEMSEG: case VM_MMAP_GETNEXT: /* - * Lock a vcpu to make sure that the memory map cannot be - * modified while it is being inspected. + * Lock the memory map while it is being inspected. */ - vcpuid = vm_get_maxcpus(sc->vm) - 1; - error = vcpu_lock_one(sc, vcpuid); - if (error) - goto done; - state_changed = 1; + vm_slock_memsegs(sc->vm); + memsegs_locked = true; break; #ifdef COMPAT_FREEBSD13 @@ -958,6 +961,8 @@ vmmdev_ioctl(struct cdev *cdev, u_long cmd, caddr_t data, int fflag, vcpu_unlock_one(sc, vcpuid); else if (state_changed == 2) vcpu_unlock_all(sc); + if (memsegs_locked) + vm_unlock_memsegs(sc->vm); done: /* @@ -978,7 +983,6 @@ vmmdev_mmap_single(struct cdev *cdev, vm_ooffset_t *offset, vm_size_t mapsize, size_t len; vm_ooffset_t segoff, first, last; int error, found, segid; - uint16_t lastcpu; bool sysmem; error = vmm_priv_check(curthread->td_ucred); @@ -997,12 +1001,9 @@ vmmdev_mmap_single(struct cdev *cdev, vm_ooffset_t *offset, vm_size_t mapsize, } /* - * Get a read lock on the guest memory map by freezing any vcpu. + * Get a read lock on the guest memory map. */ - lastcpu = vm_get_maxcpus(sc->vm) - 1; - error = vcpu_lock_one(sc, lastcpu); - if (error) - return (error); + vm_slock_memsegs(sc->vm); gpa = 0; found = 0; @@ -1029,7 +1030,7 @@ vmmdev_mmap_single(struct cdev *cdev, vm_ooffset_t *offset, vm_size_t mapsize, error = EINVAL; } } - vcpu_unlock_one(sc, lastcpu); + vm_unlock_memsegs(sc->vm); return (error); } @@ -1242,7 +1243,6 @@ devmem_mmap_single(struct cdev *cdev, vm_ooffset_t *offset, vm_size_t len, vm_ooffset_t first, last; size_t seglen; int error; - uint16_t lastcpu; bool sysmem; dsc = cdev->si_drv1; @@ -1256,16 +1256,14 @@ devmem_mmap_single(struct cdev *cdev, vm_ooffset_t *offset, vm_size_t len, if ((nprot & PROT_EXEC) || first < 0 || first >= last) return (EINVAL); - lastcpu = vm_get_maxcpus(dsc->sc->vm) - 1; - error = vcpu_lock_one(dsc->sc, lastcpu); - if (error) - return (error); + vm_slock_memsegs(dsc->sc->vm); error = vm_get_memseg(dsc->sc->vm, dsc->segid, &seglen, &sysmem, objp); KASSERT(error == 0 && !sysmem && *objp != NULL, ("%s: invalid devmem segment %d", __func__, dsc->segid)); - vcpu_unlock_one(dsc->sc, lastcpu); + + vm_unlock_memsegs(dsc->sc->vm); if (seglen >= last) { vm_object_reference(*objp);