From nobody Tue Jan 11 18:18:13 2022 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 1D2BF1941580; Tue, 11 Jan 2022 18:18:16 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JYJpY5qxsz4TZB; Tue, 11 Jan 2022 18:18:13 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 28A2A1E118; Tue, 11 Jan 2022 18:18:13 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 20BIIDjR000595; Tue, 11 Jan 2022 18:18:13 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 20BIIDgj000594; Tue, 11 Jan 2022 18:18:13 GMT (envelope-from git) Date: Tue, 11 Jan 2022 18:18:13 GMT Message-Id: <202201111818.20BIIDgj000594@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org From: Ed Maste Subject: git: b8beb716be3d - releng/12.3 - MFC r368649 / 3fd989da by kib: amd64 pmap: fix PCID mode invalidations List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: emaste X-Git-Repository: src X-Git-Refname: refs/heads/releng/12.3 X-Git-Reftype: branch X-Git-Commit: b8beb716be3d07b3918792e5759f91552fa527d5 Auto-Submitted: auto-generated ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1641925094; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=UqxwmHyUosO7eSNj4W+8DE0LO+Glg2akl5QpPjGPt74=; b=w4NngdOd3SUyW35WK26+TItPCJUlix8ox88KNahz502RCOKHcRy/yCdOz8D8dFoBspB39V 9ynu+NfE8E9FJBMbMwbjn+ZkcP8GvrFYo21vtGPXC9JEGncysqXNL8g6aR8KFxfWSKMRJY ReFjZr+Ie7wphuC7bVV6mscIU8lZqjD3fP8hYWEgaAKbPjbFtSnXVjEr1MlCTSCEr4VsJl UGcN5ZkGkUlPIKGZ46aFTGS4DwBoA5RNshXtASX49wme3XxTNlp6x36/dhhZt2rdn/hi8s pwyd8sIwmxv+WPd0mfUDyKUaL4tyYxDxroTeg2CyZrM8lv2P6PoXTNHuXQX6hg== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1641925094; a=rsa-sha256; cv=none; b=io/iVIXor78YNp5olaJvPtPcmIwzg+mU8PhXkQFZ54/+72cLtfMWH0GDq3XullObXbbZw9 SVQ92LXvQN79JA8OTeNKJyhKJd0VwXWXSET1fawsCXkMUpfwb97P1Z6UY0w6t4diSfyW+t e2rVg4beeUzHhoof9WybtKSp8uSnIPvEsRv1N7y/IP7CF9N08rX/2tOBOz/r62CbzmbYlr kiYXMtuT+oqGWbwLEPlSjTLBuZPqDMzlbzW39k9Nip2S0PCDvdFQ/vba9DrsbB9N5i1/I4 blEAe5OiDq/wg1FoL9ZqMvBLLSdzXhaJOSZjIlHUpTbRe4z5O7Zpc6+WgewmYw== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N The branch releng/12.3 has been updated by emaste: URL: https://cgit.FreeBSD.org/src/commit/?id=b8beb716be3d07b3918792e5759f91552fa527d5 commit b8beb716be3d07b3918792e5759f91552fa527d5 Author: Andriy Gapon AuthorDate: 2021-12-14 14:43:29 +0000 Commit: Ed Maste CommitDate: 2022-01-10 00:43:13 +0000 MFC r368649 / 3fd989da by kib: amd64 pmap: fix PCID mode invalidations r368649 fixed a regression in r362031 that was MFC-ed to stable/12 as a part of r362572. That commit reordered IPI send and local TLB flush in TLB invalidations. Without this fix we've been seeing problems with stale memory content where changes done under a mutex were not immediately observed by another thread after taking the same mutex. Those inconsistenices were correlated to copy-on-write faults for pages contaning the data. The change needed some adaptations as I elected to skip two significant intermediate changes: - r363195 / dc43978a, amd64: allow parallel shootdown IPIs - r363311 / 3ec7e169, amd64 pmap: microoptimize local shootdowns for PCID PTI configurations Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D33413 (cherry picked from commit 1820ca2154611d6f27ce5a5fdd561a16ac54fdd8) Approved by: so Errata: FreeBSD-EN-22:04.pcid --- sys/amd64/amd64/pmap.c | 273 ++++++++++++++++++++++++------------------------- sys/x86/x86/mp_x86.c | 32 ++++-- 2 files changed, 153 insertions(+), 152 deletions(-) diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c index c6f56c1cd73b..ee9777c788c1 100644 --- a/sys/amd64/amd64/pmap.c +++ b/sys/amd64/amd64/pmap.c @@ -2329,45 +2329,19 @@ pmap_invalidate_ept(pmap_t pmap) static cpuset_t pmap_invalidate_cpu_mask(pmap_t pmap) { - return (pmap == kernel_pmap ? all_cpus : pmap->pm_active); } static inline void -pmap_invalidate_page_pcid(pmap_t pmap, vm_offset_t va, - const bool invpcid_works1) +pmap_invalidate_preipi_pcid(pmap_t pmap) { - struct invpcid_descr d; - uint64_t kcr3, ucr3; - uint32_t pcid; u_int cpuid, i; + sched_pin(); + cpuid = PCPU_GET(cpuid); - if (pmap == PCPU_GET(curpmap)) { - if (pmap->pm_ucr3 != PMAP_NO_CR3) { - /* - * Because pm_pcid is recalculated on a - * context switch, we must disable switching. - * Otherwise, we might use a stale value - * below. - */ - critical_enter(); - pcid = pmap->pm_pcids[cpuid].pm_pcid; - if (invpcid_works1) { - d.pcid = pcid | PMAP_PCID_USER_PT; - d.pad = 0; - d.addr = va; - invpcid(&d, INVPCID_ADDR); - } else { - kcr3 = pmap->pm_cr3 | pcid | CR3_PCID_SAVE; - ucr3 = pmap->pm_ucr3 | pcid | - PMAP_PCID_USER_PT | CR3_PCID_SAVE; - pmap_pti_pcid_invlpg(ucr3, kcr3, va); - } - critical_exit(); - } - } else - pmap->pm_pcids[cpuid].pm_gen = 0; + if (pmap != PCPU_GET(curpmap)) + cpuid = 0xffffffff; /* An impossible value */ CPU_FOREACH(i) { if (cpuid != i) @@ -2388,52 +2362,96 @@ pmap_invalidate_page_pcid(pmap_t pmap, vm_offset_t va, } static void -pmap_invalidate_page_pcid_invpcid(pmap_t pmap, vm_offset_t va) +pmap_invalidate_preipi_nopcid(pmap_t pmap __unused) { + sched_pin(); +} - pmap_invalidate_page_pcid(pmap, va, true); +DEFINE_IFUNC(static, void, pmap_invalidate_preipi, (pmap_t), static) +{ + return (pmap_pcid_enabled ? pmap_invalidate_preipi_pcid : + pmap_invalidate_preipi_nopcid); +} + +static inline void +pmap_invalidate_page_pcid_cb(pmap_t pmap, vm_offset_t va, + const bool invpcid_works1) +{ + struct invpcid_descr d; + uint64_t kcr3, ucr3; + uint32_t pcid; + u_int cpuid; + + /* + * Because pm_pcid is recalculated on a context switch, we + * must ensure there is no preemption, not just pinning. + * Otherwise, we might use a stale value below. + */ + CRITICAL_ASSERT(curthread); + + /* + * No need to do anything with user page tables invalidation + * if there is no user page table. + */ + if (pmap->pm_ucr3 == PMAP_NO_CR3) + return; + + cpuid = PCPU_GET(cpuid); + + pcid = pmap->pm_pcids[cpuid].pm_pcid; + if (invpcid_works1) { + d.pcid = pcid | PMAP_PCID_USER_PT; + d.pad = 0; + d.addr = va; + invpcid(&d, INVPCID_ADDR); + } else { + kcr3 = pmap->pm_cr3 | pcid | CR3_PCID_SAVE; + ucr3 = pmap->pm_ucr3 | pcid | PMAP_PCID_USER_PT | CR3_PCID_SAVE; + pmap_pti_pcid_invlpg(ucr3, kcr3, va); + } } static void -pmap_invalidate_page_pcid_noinvpcid(pmap_t pmap, vm_offset_t va) +pmap_invalidate_page_pcid_invpcid_cb(pmap_t pmap, vm_offset_t va) { + pmap_invalidate_page_pcid_cb(pmap, va, true); +} - pmap_invalidate_page_pcid(pmap, va, false); +static void +pmap_invalidate_page_pcid_noinvpcid_cb(pmap_t pmap, vm_offset_t va) +{ + pmap_invalidate_page_pcid_cb(pmap, va, false); } static void -pmap_invalidate_page_nopcid(pmap_t pmap, vm_offset_t va) +pmap_invalidate_page_nopcid_cb(pmap_t pmap __unused, vm_offset_t va __unused) { } -DEFINE_IFUNC(static, void, pmap_invalidate_page_mode, (pmap_t, vm_offset_t), +DEFINE_IFUNC(static, void, pmap_invalidate_page_cb, (pmap_t, vm_offset_t), static) { - if (pmap_pcid_enabled) - return (invpcid_works ? pmap_invalidate_page_pcid_invpcid : - pmap_invalidate_page_pcid_noinvpcid); - return (pmap_invalidate_page_nopcid); + return (invpcid_works ? pmap_invalidate_page_pcid_invpcid_cb : + pmap_invalidate_page_pcid_noinvpcid_cb); + return (pmap_invalidate_page_nopcid_cb); } static void pmap_invalidate_page_curcpu_cb(pmap_t pmap, vm_offset_t va, vm_offset_t addr2 __unused) { - if (pmap == kernel_pmap) { invlpg(va); - } else { - if (pmap == PCPU_GET(curpmap)) - invlpg(va); - pmap_invalidate_page_mode(pmap, va); + } else if (pmap == PCPU_GET(curpmap)) { + invlpg(va); + pmap_invalidate_page_cb(pmap, va); } } void pmap_invalidate_page(pmap_t pmap, vm_offset_t va) { - if (pmap_type_guest(pmap)) { pmap_invalidate_ept(pmap); return; @@ -2442,6 +2460,7 @@ pmap_invalidate_page(pmap_t pmap, vm_offset_t va) KASSERT(pmap->pm_type == PT_X86, ("pmap_invalidate_page: invalid type %d", pmap->pm_type)); + pmap_invalidate_preipi(pmap); smp_masked_invlpg(pmap_invalidate_cpu_mask(pmap), va, pmap, pmap_invalidate_page_curcpu_cb); } @@ -2450,73 +2469,62 @@ pmap_invalidate_page(pmap_t pmap, vm_offset_t va) #define PMAP_INVLPG_THRESHOLD (4 * 1024 * PAGE_SIZE) static void -pmap_invalidate_range_pcid(pmap_t pmap, vm_offset_t sva, vm_offset_t eva, +pmap_invalidate_range_pcid_cb(pmap_t pmap, vm_offset_t sva, vm_offset_t eva, const bool invpcid_works1) { struct invpcid_descr d; uint64_t kcr3, ucr3; uint32_t pcid; - u_int cpuid, i; + u_int cpuid; + + CRITICAL_ASSERT(curthread); + + if (pmap != PCPU_GET(curpmap) || + pmap->pm_ucr3 == PMAP_NO_CR3) + return; cpuid = PCPU_GET(cpuid); - if (pmap == PCPU_GET(curpmap)) { - if (pmap->pm_ucr3 != PMAP_NO_CR3) { - critical_enter(); - pcid = pmap->pm_pcids[cpuid].pm_pcid; - if (invpcid_works1) { - d.pcid = pcid | PMAP_PCID_USER_PT; - d.pad = 0; - d.addr = sva; - for (; d.addr < eva; d.addr += PAGE_SIZE) - invpcid(&d, INVPCID_ADDR); - } else { - kcr3 = pmap->pm_cr3 | pcid | CR3_PCID_SAVE; - ucr3 = pmap->pm_ucr3 | pcid | - PMAP_PCID_USER_PT | CR3_PCID_SAVE; - pmap_pti_pcid_invlrng(ucr3, kcr3, sva, eva); - } - critical_exit(); - } - } else - pmap->pm_pcids[cpuid].pm_gen = 0; - CPU_FOREACH(i) { - if (cpuid != i) - pmap->pm_pcids[i].pm_gen = 0; + pcid = pmap->pm_pcids[cpuid].pm_pcid; + if (invpcid_works1) { + d.pcid = pcid | PMAP_PCID_USER_PT; + d.pad = 0; + for (d.addr = sva; d.addr < eva; d.addr += PAGE_SIZE) + invpcid(&d, INVPCID_ADDR); + } else { + kcr3 = pmap->pm_cr3 | pcid | CR3_PCID_SAVE; + ucr3 = pmap->pm_ucr3 | pcid | PMAP_PCID_USER_PT | CR3_PCID_SAVE; + pmap_pti_pcid_invlrng(ucr3, kcr3, sva, eva); } - /* See the comment in pmap_invalidate_page_pcid(). */ - atomic_thread_fence_seq_cst(); } static void -pmap_invalidate_range_pcid_invpcid(pmap_t pmap, vm_offset_t sva, +pmap_invalidate_range_pcid_invpcid_cb(pmap_t pmap, vm_offset_t sva, vm_offset_t eva) { - - pmap_invalidate_range_pcid(pmap, sva, eva, true); + pmap_invalidate_range_pcid_cb(pmap, sva, eva, true); } static void -pmap_invalidate_range_pcid_noinvpcid(pmap_t pmap, vm_offset_t sva, +pmap_invalidate_range_pcid_noinvpcid_cb(pmap_t pmap, vm_offset_t sva, vm_offset_t eva) { - - pmap_invalidate_range_pcid(pmap, sva, eva, false); + pmap_invalidate_range_pcid_cb(pmap, sva, eva, false); } static void -pmap_invalidate_range_nopcid(pmap_t pmap, vm_offset_t sva, vm_offset_t eva) +pmap_invalidate_range_nopcid_cb(pmap_t pmap __unused, vm_offset_t sva __unused, + vm_offset_t eva __unused) { } -DEFINE_IFUNC(static, void, pmap_invalidate_range_mode, (pmap_t, vm_offset_t, +DEFINE_IFUNC(static, void, pmap_invalidate_range_cb, (pmap_t, vm_offset_t, vm_offset_t), static) { - if (pmap_pcid_enabled) - return (invpcid_works ? pmap_invalidate_range_pcid_invpcid : - pmap_invalidate_range_pcid_noinvpcid); - return (pmap_invalidate_range_nopcid); + return (invpcid_works ? pmap_invalidate_range_pcid_invpcid_cb : + pmap_invalidate_range_pcid_noinvpcid_cb); + return (pmap_invalidate_range_nopcid_cb); } static void @@ -2527,19 +2535,16 @@ pmap_invalidate_range_curcpu_cb(pmap_t pmap, vm_offset_t sva, vm_offset_t eva) if (pmap == kernel_pmap) { for (addr = sva; addr < eva; addr += PAGE_SIZE) invlpg(addr); - } else { - if (pmap == PCPU_GET(curpmap)) { - for (addr = sva; addr < eva; addr += PAGE_SIZE) - invlpg(addr); - } - pmap_invalidate_range_mode(pmap, sva, eva); + } else if (pmap == PCPU_GET(curpmap)) { + for (addr = sva; addr < eva; addr += PAGE_SIZE) + invlpg(addr); + pmap_invalidate_range_cb(pmap, sva, eva); } } void pmap_invalidate_range(pmap_t pmap, vm_offset_t sva, vm_offset_t eva) { - if (eva - sva >= PMAP_INVLPG_THRESHOLD) { pmap_invalidate_all(pmap); return; @@ -2553,17 +2558,18 @@ pmap_invalidate_range(pmap_t pmap, vm_offset_t sva, vm_offset_t eva) KASSERT(pmap->pm_type == PT_X86, ("pmap_invalidate_range: invalid type %d", pmap->pm_type)); + pmap_invalidate_preipi(pmap); smp_masked_invlpg_range(pmap_invalidate_cpu_mask(pmap), sva, eva, pmap, pmap_invalidate_range_curcpu_cb); } static inline void -pmap_invalidate_all_pcid(pmap_t pmap, bool invpcid_works1) +pmap_invalidate_all_pcid_cb(pmap_t pmap, bool invpcid_works1) { struct invpcid_descr d; uint64_t kcr3, ucr3; uint32_t pcid; - u_int cpuid, i; + u_int cpuid; if (pmap == kernel_pmap) { if (invpcid_works1) { @@ -2572,87 +2578,72 @@ pmap_invalidate_all_pcid(pmap_t pmap, bool invpcid_works1) } else { invltlb_glob(); } - } else { + } else if (pmap == PCPU_GET(curpmap)) { + CRITICAL_ASSERT(curthread); cpuid = PCPU_GET(cpuid); - if (pmap == PCPU_GET(curpmap)) { - critical_enter(); - pcid = pmap->pm_pcids[cpuid].pm_pcid; - if (invpcid_works1) { - d.pcid = pcid; - d.pad = 0; - d.addr = 0; + + pcid = pmap->pm_pcids[cpuid].pm_pcid; + if (invpcid_works1) { + d.pcid = pcid; + d.pad = 0; + d.addr = 0; + invpcid(&d, INVPCID_CTX); + if (pmap->pm_ucr3 != PMAP_NO_CR3) { + d.pcid |= PMAP_PCID_USER_PT; invpcid(&d, INVPCID_CTX); - if (pmap->pm_ucr3 != PMAP_NO_CR3) { - d.pcid |= PMAP_PCID_USER_PT; - invpcid(&d, INVPCID_CTX); - } + } + } else { + kcr3 = pmap->pm_cr3 | pcid; + ucr3 = pmap->pm_ucr3; + if (ucr3 != PMAP_NO_CR3) { + ucr3 |= pcid | PMAP_PCID_USER_PT; + pmap_pti_pcid_invalidate(ucr3, kcr3); } else { - kcr3 = pmap->pm_cr3 | pcid; - ucr3 = pmap->pm_ucr3; - if (ucr3 != PMAP_NO_CR3) { - ucr3 |= pcid | PMAP_PCID_USER_PT; - pmap_pti_pcid_invalidate(ucr3, kcr3); - } else { - load_cr3(kcr3); - } + load_cr3(kcr3); } - critical_exit(); - } else - pmap->pm_pcids[cpuid].pm_gen = 0; - CPU_FOREACH(i) { - if (cpuid != i) - pmap->pm_pcids[i].pm_gen = 0; } } - /* See the comment in pmap_invalidate_page_pcid(). */ - atomic_thread_fence_seq_cst(); } static void -pmap_invalidate_all_pcid_invpcid(pmap_t pmap) +pmap_invalidate_all_pcid_invpcid_cb(pmap_t pmap) { - - pmap_invalidate_all_pcid(pmap, true); + pmap_invalidate_all_pcid_cb(pmap, true); } static void -pmap_invalidate_all_pcid_noinvpcid(pmap_t pmap) +pmap_invalidate_all_pcid_noinvpcid_cb(pmap_t pmap) { - - pmap_invalidate_all_pcid(pmap, false); + pmap_invalidate_all_pcid_cb(pmap, false); } static void -pmap_invalidate_all_nopcid(pmap_t pmap) +pmap_invalidate_all_nopcid_cb(pmap_t pmap) { - if (pmap == kernel_pmap) invltlb_glob(); else if (pmap == PCPU_GET(curpmap)) invltlb(); } -DEFINE_IFUNC(static, void, pmap_invalidate_all_mode, (pmap_t), static) +DEFINE_IFUNC(static, void, pmap_invalidate_all_cb, (pmap_t), static) { - if (pmap_pcid_enabled) - return (invpcid_works ? pmap_invalidate_all_pcid_invpcid : - pmap_invalidate_all_pcid_noinvpcid); - return (pmap_invalidate_all_nopcid); + return (invpcid_works ? pmap_invalidate_all_pcid_invpcid_cb : + pmap_invalidate_all_pcid_noinvpcid_cb); + return (pmap_invalidate_all_nopcid_cb); } static void pmap_invalidate_all_curcpu_cb(pmap_t pmap, vm_offset_t addr1 __unused, vm_offset_t addr2 __unused) { - - pmap_invalidate_all_mode(pmap); + pmap_invalidate_all_cb(pmap); } void pmap_invalidate_all(pmap_t pmap) { - if (pmap_type_guest(pmap)) { pmap_invalidate_ept(pmap); return; @@ -2661,6 +2652,7 @@ pmap_invalidate_all(pmap_t pmap) KASSERT(pmap->pm_type == PT_X86, ("pmap_invalidate_all: invalid type %d", pmap->pm_type)); + pmap_invalidate_preipi(pmap); smp_masked_invltlb(pmap_invalidate_cpu_mask(pmap), pmap, pmap_invalidate_all_curcpu_cb); } @@ -2669,14 +2661,13 @@ static void pmap_invalidate_cache_curcpu_cb(pmap_t pmap __unused, vm_offset_t va __unused, vm_offset_t addr2 __unused) { - wbinvd(); } void pmap_invalidate_cache(void) { - + sched_pin(); smp_cache_flush(pmap_invalidate_cache_curcpu_cb); } diff --git a/sys/x86/x86/mp_x86.c b/sys/x86/x86/mp_x86.c index 73565360007b..d7cd50e99eed 100644 --- a/sys/x86/x86/mp_x86.c +++ b/sys/x86/x86/mp_x86.c @@ -1649,13 +1649,16 @@ volatile uint32_t smp_tlb_generation; * Used by pmap to request invalidation of TLB or cache on local and * remote processors. Mask provides the set of remote CPUs which are * to be signalled with the IPI specified by vector. The curcpu_cb - * callback is invoked on the calling CPU while waiting for remote - * CPUs to complete the operation. + * callback is invoked on the calling CPU in a critical section while + * waiting for remote CPUs to complete the operation. * * The callback function is called unconditionally on the caller's * underlying processor, even when this processor is not set in the * mask. So, the callback function must be prepared to handle such * spurious invocations. + * + * This function must be called with the thread pinned, and it unpins on + * completion. */ static void smp_targeted_tlb_shootdown(cpuset_t mask, u_int vector, pmap_t pmap, @@ -1670,23 +1673,21 @@ smp_targeted_tlb_shootdown(cpuset_t mask, u_int vector, pmap_t pmap, * It is not necessary to signal other CPUs while booting or * when in the debugger. */ - if (kdb_active || panicstr != NULL || !smp_started) { - curcpu_cb(pmap, addr1, addr2); - return; - } + if (kdb_active || panicstr != NULL || !smp_started) + goto local_cb; - sched_pin(); + KASSERT(curthread->td_pinned > 0, ("curthread not pinned")); /* * Check for other cpus. Return if none. */ if (CPU_ISFULLSET(&mask)) { if (mp_ncpus <= 1) - goto nospinexit; + goto local_cb; } else { CPU_CLR(PCPU_GET(cpuid), &mask); if (CPU_EMPTY(&mask)) - goto nospinexit; + goto local_cb; } if (!(read_eflags() & PSL_I)) @@ -1718,13 +1719,22 @@ smp_targeted_tlb_shootdown(cpuset_t mask, u_int vector, pmap_t pmap, while (*p_cpudone != generation) ia32_pause(); } - mtx_unlock_spin(&smp_ipi_mtx); + + /* + * Unpin before unlocking smp_ipi_mtx. If the thread owes + * preemption, this allows scheduler to select thread on any + * CPU from its cpuset. + */ sched_unpin(); + mtx_unlock_spin(&smp_ipi_mtx); + return; -nospinexit: +local_cb: + critical_enter(); curcpu_cb(pmap, addr1, addr2); sched_unpin(); + critical_exit(); } void