git: fab343a7168a - main - uma: Improve M_USE_RESERVE handling in keg_fetch_slab()

From: Mark Johnston <markj_at_FreeBSD.org>
Date: Mon, 01 Nov 2021 13:52:16 UTC
The branch main has been updated by markj:

URL: https://cgit.FreeBSD.org/src/commit/?id=fab343a7168a2f033073bb5f65b5af17d9092c6f

commit fab343a7168a2f033073bb5f65b5af17d9092c6f
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2021-11-01 13:27:35 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2021-11-01 13:51:18 +0000

    uma: Improve M_USE_RESERVE handling in keg_fetch_slab()
    
    M_USE_RESERVE is used in a couple of places in the VM to avoid unbounded
    recursion when the direct map is not available, as is the case on 32-bit
    platforms or when certain kernel sanitizers (KASAN and KMSAN) are
    enabled.  For example, to allocate KVA, the kernel might allocate a
    kernel map entry, which might require a new slab, which requires KVA.
    
    For these zones, we use uma_prealloc() to populate a reserve of items,
    and then in certain serialized contexts M_USE_RESERVE can be used to
    guarantee a successful allocation.  uma_prealloc() allocates the
    requested number of items, distributing them evenly among NUMA domains.
    Thus, in a first-touch zone, to satisfy an M_USE_RESERVE allocation we
    might have to check the slab lists of other domains than the current one
    to provide the semantics expected by consumers.
    
    So, try harder to find an item if M_USE_RESERVE is specified and the keg
    doesn't have anything for current (first-touch) domain.  Specifically,
    fall back to a round-robin slab allocation.  This change fixes boot-time
    panics on NUMA systems with KASAN or KMSAN enabled.[1]
    
    Alternately we could have uma_prealloc() allocate the requested number
    of items for each domain, but for some existing consumers this would be
    quite wasteful.  In general I think keg_fetch_slab() should try harder
    to find free slabs in other domains before trying to allocate fresh
    ones, but let's limit this to M_USE_RESERVE for now.
    
    Also fix a separate problem that I noticed: in a non-round-robin slab
    allocation with M_WAITOK, rather than sleeping after a failed slab
    allocation we simply try again.  Call vm_wait_domain() before retrying.
    
    Reported by:    mjg, tuexen [1]
    Reviewed by:    alc
    MFC after:      2 weeks
    Sponsored by:   The FreeBSD Foundation
    Differential Revision:  https://reviews.freebsd.org/D32515
---
 sys/vm/uma_core.c | 33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/sys/vm/uma_core.c b/sys/vm/uma_core.c
index 35ed473da5ca..de9605a28bb6 100644
--- a/sys/vm/uma_core.c
+++ b/sys/vm/uma_core.c
@@ -3914,6 +3914,9 @@ keg_fetch_slab(uma_keg_t keg, uma_zone_t zone, int rdomain, const int flags)
 	int aflags, domain;
 	bool rr;
 
+	KASSERT((flags & (M_WAITOK | M_NOVM)) != (M_WAITOK | M_NOVM),
+	    ("%s: invalid flags %#x", __func__, flags));
+
 restart:
 	/*
 	 * Use the keg's policy if upper layers haven't already specified a
@@ -3939,17 +3942,29 @@ restart:
 			return (slab);
 
 		/*
-		 * M_NOVM means don't ask at all!
+		 * M_NOVM is used to break the recursion that can otherwise
+		 * occur if low-level memory management routines use UMA.
 		 */
-		if (flags & M_NOVM)
-			break;
+		if ((flags & M_NOVM) == 0) {
+			slab = keg_alloc_slab(keg, zone, domain, flags, aflags);
+			if (slab != NULL)
+				return (slab);
+		}
 
-		slab = keg_alloc_slab(keg, zone, domain, flags, aflags);
-		if (slab != NULL)
-			return (slab);
-		if (!rr && (flags & M_WAITOK) == 0)
-			break;
-		if (rr && vm_domainset_iter_policy(&di, &domain) != 0) {
+		if (!rr) {
+			if ((flags & M_USE_RESERVE) != 0) {
+				/*
+				 * Drain reserves from other domains before
+				 * giving up or sleeping.  It may be useful to
+				 * support per-domain reserves eventually.
+				 */
+				rdomain = UMA_ANYDOMAIN;
+				goto restart;
+			}
+			if ((flags & M_WAITOK) == 0)
+				break;
+			vm_wait_domain(domain);
+		} else if (vm_domainset_iter_policy(&di, &domain) != 0) {
 			if ((flags & M_WAITOK) != 0) {
 				vm_wait_doms(&keg->uk_dr.dr_policy->ds_mask, 0);
 				goto restart;