Re: swap_pager: cannot allocate bio
- In reply to: Chris Ross : "Re: swap_pager: cannot allocate bio"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sat, 20 Nov 2021 18:23:06 UTC
On Fri, Nov 19, 2021 at 10:35:52PM -0500, Chris Ross wrote:
> (Sorry that the subject on this thread may not be relevant any more, but I don’t want to disconnect the thread.)
>
> > On Nov 15, 2021, at 13:17, Chris Ross <cross+freebsd@distal.com> wrote:
> >> On Nov 15, 2021, at 10:08, Andriy Gapon <avg@freebsd.org> wrote:
> >
> >> Yes, I propose to remove the wait for ARC evictions from arc_lowmem().
> >>
> >> Another thing that may help a bit is having a greater "slack" between a threshold where the page daemon starts paging out and a threshold where memory allocations start to wait (via vm_wait_domain).
> >>
> >> Also, I think that for a long time we had a problem (but not sure if it's still present) where allocations succeeded without waiting until the free memory went below certain threshold M, but once a thread started waiting in vm_wait it would not be woken up until the free memory went above another threshold N. And the problem was that N >> M. In other words, a lot of memory had to be freed (and not grabbed by other threads) before the waiting thread would be woken up.
> >
> > Thank you both for your inputs. Let me know if you’d like me to try anything, and I’ll kick (reboot) the system and can build a new kernel when you’d like. I did get another procstat -kka out of it this morning, and the system has since gone less responsive, but I assume that new procstat won’t show anything last night’s didn’t.
>
> I’m still having this issue. I rebooted the machine, fsck’d the disks, and got it running again. Again, it ran for ~50 hours before getting stuck. I got another procstat-kka off of it, let me know if you’d like a copy of it. But, it looks like the active processes are all in arc_wait_for_eviction. A pagedaemon is in a arc_wait_for_eviction under a arc_lowmem, but the python processes that were doing the real work don’t have arc_lowmem in their stacks, just the arc_wait_for_eviction.
>
> Please let me know if there’s anything I can do to assist in finding a remedy for this. Thank you.
Here is a patch which tries to address the proximate cause of the
problem. It would be helpful to know if it addresses the deadlocks
you're seeing. I tested it lightly by putting a NUMA system under
memory pressure using postgres.
diff --git a/sys/contrib/openzfs/include/os/freebsd/spl/sys/kmem.h b/sys/contrib/openzfs/include/os/freebsd/spl/sys/kmem.h
index dc3b4f5d7877..4792a0b29ecf 100644
--- a/sys/contrib/openzfs/include/os/freebsd/spl/sys/kmem.h
+++ b/sys/contrib/openzfs/include/os/freebsd/spl/sys/kmem.h
@@ -45,7 +45,7 @@ MALLOC_DECLARE(M_SOLARIS);
#define POINTER_INVALIDATE(pp) (*(pp) = (void *)((uintptr_t)(*(pp)) | 0x1))
#define KM_SLEEP M_WAITOK
-#define KM_PUSHPAGE M_WAITOK
+#define KM_PUSHPAGE (M_WAITOK | M_USE_RESERVE) /* XXXMJ */
#define KM_NOSLEEP M_NOWAIT
#define KM_NORMALPRI 0
#define KMC_NODEBUG UMA_ZONE_NODUMP
diff --git a/sys/contrib/openzfs/module/zfs/arc.c b/sys/contrib/openzfs/module/zfs/arc.c
index 79e2d4381830..50cd45d76c52 100644
--- a/sys/contrib/openzfs/module/zfs/arc.c
+++ b/sys/contrib/openzfs/module/zfs/arc.c
@@ -4188,11 +4188,13 @@ arc_evict_state(arc_state_t *state, uint64_t spa, uint64_t bytes,
* pick up where we left off for each individual sublist, rather
* than starting from the tail each time.
*/
- markers = kmem_zalloc(sizeof (*markers) * num_sublists, KM_SLEEP);
+ markers = kmem_zalloc(sizeof (*markers) * num_sublists,
+ KM_SLEEP | KM_PUSHPAGE);
for (int i = 0; i < num_sublists; i++) {
multilist_sublist_t *mls;
- markers[i] = kmem_cache_alloc(hdr_full_cache, KM_SLEEP);
+ markers[i] = kmem_cache_alloc(hdr_full_cache,
+ KM_SLEEP | KM_PUSHPAGE);
/*
* A b_spa of 0 is used to indicate that this header is
diff --git a/sys/vm/uma_core.c b/sys/vm/uma_core.c
index 7b83d81a423d..3fc7859387e0 100644
--- a/sys/vm/uma_core.c
+++ b/sys/vm/uma_core.c
@@ -3932,7 +3932,8 @@ keg_fetch_slab(uma_keg_t keg, uma_zone_t zone, int rdomain, const int flags)
vm_domainset_iter_policy_ref_init(&di, &keg->uk_dr, &domain,
&aflags);
} else {
- aflags = flags;
+ aflags = (flags & M_USE_RESERVE) != 0 ?
+ (flags & ~M_WAITOK) | M_NOWAIT : flags;
domain = rdomain;
}