Re: swap_pager: cannot allocate bio

From: Mark Johnston <markj_at_freebsd.org>
Date: Mon, 15 Nov 2021 14:50:51 UTC
On Mon, Nov 15, 2021 at 04:20:26PM +0200, Andriy Gapon wrote:
> On 15/11/2021 05:26, Chris Ross wrote:
> > A procstat -kka output is available (208kb of text, 1441 lines) at
> > https://pastebin.com/SvDcvRvb
> 
>     67 100542 pagedaemon          dom0                mi_switch+0xc1 
> _cv_wait+0xf2 arc_wait_for_eviction+0x1df arc_lowmem+0xca 
> vm_pageout_worker+0x3c4 vm_pageout+0x1d7 fork_exit+0x8a fork_trampoline+0xe
> 
> I was always of an opinion that waiting for the ARC reclaim in arc_lowmem was 
> wrong.  This shows an example of why it is so.
> 
> > An ssh of a top command completed and shows:
> > 
> > last pid: 91551;  load averages:  0.00,  0.02,  0.30  up 2+00:19:33    22:23:15
> > 40 processes:  1 running, 38 sleeping, 1 zombie
> > CPU:  3.9% user,  0.0% nice,  0.9% system,  0.0% interrupt, 95.2% idle
> > Mem: 58G Active, 210M Inact, 1989M Laundry, 52G Wired, 1427M Buf, 12G Free
> 
> To me it looks like there is still plenty of free memory.
> 
> I am not sure why vm_wait_domain (called by vm_page_alloc_noobj_domain) is not 
> waking up.

It's a deadlock: the page daemon is sleeping on the arc evict thread,
and the arc evict thread is waiting for memory:

 2561 100722 zfskern             arc_evict           
 mi_switch+0xc1 _sleep+0x1cb vm_wait_doms+0xe2 vm_wait_domain+0x51
 vm_page_alloc_noobj_domain+0x184 uma_small_alloc+0x62 keg_alloc_slab+0xb0
 zone_import+0xee zone_alloc_item+0x6f arc_evict_state+0x81 arc_evict_cb+0x483
 zthr_procedure+0xba fork_exit+0x8a fork_trampoline+0xe 

I presume this is from the marker allocations in arc_evict_state().

The second problem is that UMA is refusing to try to allocate from the
"wrong" NUMA domain, but that policy seems overly strict.  Fixing that
alone would make the problem harder to hit, but I think it wouldn't
solve it completely.

> Perhaps this is some sort of a NUMA related issue where one memory domain is 
> exhausted while other(s) still have  a lot of memory.
> Or maybe it's something else but it must be some sort of a bug.
> 
> > ARC: 48G Total, 10G MFU, 38G MRU, 128K Anon, 106M Header, 23M Other
> >       46G Compressed, 46G Uncompressed, 1.00:1 Ratio
> > Swap: 425G Total, 3487M Used, 422G Free
> 
> 
> -- 
> Andriy Gapon