git: d5ebaa6f8f85 - stable/13 - uma: Improve M_USE_RESERVE handling in keg_fetch_slab()

From: Mark Johnston <markj_at_FreeBSD.org>
Date: Mon, 15 Nov 2021 14:07:37 UTC
The branch stable/13 has been updated by markj:

URL: https://cgit.FreeBSD.org/src/commit/?id=d5ebaa6f8f850bb6f6273f01386832efcb295827

commit d5ebaa6f8f850bb6f6273f01386832efcb295827
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2021-11-01 13:27:35 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2021-11-15 14:06:54 +0000

    uma: Improve M_USE_RESERVE handling in keg_fetch_slab()
    
    M_USE_RESERVE is used in a couple of places in the VM to avoid unbounded
    recursion when the direct map is not available, as is the case on 32-bit
    platforms or when certain kernel sanitizers (KASAN and KMSAN) are
    enabled.  For example, to allocate KVA, the kernel might allocate a
    kernel map entry, which might require a new slab, which requires KVA.
    
    For these zones, we use uma_prealloc() to populate a reserve of items,
    and then in certain serialized contexts M_USE_RESERVE can be used to
    guarantee a successful allocation.  uma_prealloc() allocates the
    requested number of items, distributing them evenly among NUMA domains.
    Thus, in a first-touch zone, to satisfy an M_USE_RESERVE allocation we
    might have to check the slab lists of other domains than the current one
    to provide the semantics expected by consumers.
    
    So, try harder to find an item if M_USE_RESERVE is specified and the keg
    doesn't have anything for current (first-touch) domain.  Specifically,
    fall back to a round-robin slab allocation.  This change fixes boot-time
    panics on NUMA systems with KASAN or KMSAN enabled.[1]
    
    Alternately we could have uma_prealloc() allocate the requested number
    of items for each domain, but for some existing consumers this would be
    quite wasteful.  In general I think keg_fetch_slab() should try harder
    to find free slabs in other domains before trying to allocate fresh
    ones, but let's limit this to M_USE_RESERVE for now.
    
    Also fix a separate problem that I noticed: in a non-round-robin slab
    allocation with M_WAITOK, rather than sleeping after a failed slab
    allocation we simply try again.  Call vm_wait_domain() before retrying.
    
    Reported by:    mjg, tuexen [1]
    Reviewed by:    alc
    Sponsored by:   The FreeBSD Foundation
    
    (cherry picked from commit fab343a7168a2f033073bb5f65b5af17d9092c6f)
---
 sys/vm/uma_core.c | 33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/sys/vm/uma_core.c b/sys/vm/uma_core.c
index 3e4e3c7c4ce1..1fb066d71762 100644
--- a/sys/vm/uma_core.c
+++ b/sys/vm/uma_core.c
@@ -3858,6 +3858,9 @@ keg_fetch_slab(uma_keg_t keg, uma_zone_t zone, int rdomain, const int flags)
 	int aflags, domain;
 	bool rr;
 
+	KASSERT((flags & (M_WAITOK | M_NOVM)) != (M_WAITOK | M_NOVM),
+	    ("%s: invalid flags %#x", __func__, flags));
+
 restart:
 	/*
 	 * Use the keg's policy if upper layers haven't already specified a
@@ -3883,17 +3886,29 @@ restart:
 			return (slab);
 
 		/*
-		 * M_NOVM means don't ask at all!
+		 * M_NOVM is used to break the recursion that can otherwise
+		 * occur if low-level memory management routines use UMA.
 		 */
-		if (flags & M_NOVM)
-			break;
+		if ((flags & M_NOVM) == 0) {
+			slab = keg_alloc_slab(keg, zone, domain, flags, aflags);
+			if (slab != NULL)
+				return (slab);
+		}
 
-		slab = keg_alloc_slab(keg, zone, domain, flags, aflags);
-		if (slab != NULL)
-			return (slab);
-		if (!rr && (flags & M_WAITOK) == 0)
-			break;
-		if (rr && vm_domainset_iter_policy(&di, &domain) != 0) {
+		if (!rr) {
+			if ((flags & M_USE_RESERVE) != 0) {
+				/*
+				 * Drain reserves from other domains before
+				 * giving up or sleeping.  It may be useful to
+				 * support per-domain reserves eventually.
+				 */
+				rdomain = UMA_ANYDOMAIN;
+				goto restart;
+			}
+			if ((flags & M_WAITOK) == 0)
+				break;
+			vm_wait_domain(domain);
+		} else if (vm_domainset_iter_policy(&di, &domain) != 0) {
 			if ((flags & M_WAITOK) != 0) {
 				vm_wait_doms(&keg->uk_dr.dr_policy->ds_mask, 0);
 				goto restart;