Re: swap_pager: cannot allocate bio

From: Chris Ross <cross+freebsd_at_distal.com>
Date: Fri, 12 Nov 2021 13:49:17 UTC
>> root@host:~ # screen
>> load: 0.07 cmd: csh 56116 [vmwait] 35.00r 0.00u 0.01s 0% 3984k
>> mi_switch+0xc1 _sleep+0x1cb vm_wait_doms+0xe2 vm_wait_domain+0x51 vm_domain_alloc_fail+0x86 vm_page_alloc_domain_after+0x7e uma_small_alloc+0x58 keg_alloc_slab+0xba zone_import+0xee zone_alloc_item+0x6f malloc+0x5d sigacts_alloc+0x1c fork1+0x9fb sys_fork+0x54 amd64_syscall+0x10c fast_syscall_common+0xf8  As before, ps and even mount and df work here on console.  But, a “zpool status tank” will hang as before.  A Ctrl+D on it

>> load: 0.00 cmd: zpool 62829 [aw.aew_cv] 37.89r 0.00u 0.00s 0% 6976k
>> mi_switch+0xc1 _cv_wait+0xf2 arc_wait_for_eviction+0x14a arc_get_data_impl+0xdb arc_hdr_alloc_abd+0xa6 arc_hdr_alloc+0x11e arc_read+0x4f4 dbuf_read+0xc08 dmu_buf_hold+0x46 zap_lookup_norm+0x35 zap_contains+0x26 vdev_rebuild_get_stats+0xac vdev_config_generate+0x3e9 vdev_config_generate+0x74f spa_config_generate+0x2a2 spa_open_common+0x25c spa_get_stats+0x4e zfs_ioc_pool_stats+0x22

> Hi,
> 
> Interesting. The details of these stacktraces are unknown to me. But it looks like it is waiting for available memory in both cases. What is the memory usage of the system while all this is happening. Is it swapping a lot?
> And what is the real setup of the disks? Are things like GELI used (not that the stack shows that) or swap-on-zfs?

It’s pretty simple.  No GELI, just three 3-disk raidz’s.  And swap is a partition on a physical (ish: hardware RAID1) disk, which is also where the OS and everything other than the one large ZFS filesystem are.

> And is there something else interesting in the logs than "swap_pager: cannot allocate bio"? Maybe a reason why it can't allocate the bio.

Not that I saw.  A new execution of procstat -kk (started yesterday), as well as a dmesg, both hang now.  They seem to be stuck with the same stack-trace as screen is.  And the zpool status shows the same stack with Ctrl-T as it has.  Looking at the logs now, Since I rebooted the system 24 hours ago, there are no kernel logs after the failure that began yesterday afternoon.  Apparently, this is a reproducible problem, it takes a day or less to get stuck.  So, that’s valuable in a way.  ;-)
 
> I would not know a pointer on how to debug this except for checking tools like iostat, vmstat, etc.. Of course running 13-STABLE can give an interesting data point.

So, tl;dr; no data from the most recent hang other than what the stack-traces show.  Not even the “cannot allocate bio” I saw two days ago after  increasing swap size.  I can take a look at 13-STABLE, when I give up on this and reboot (likely today) I’ll try building that.

         - Chris