Re: swap_pager: cannot allocate bio

From: Chris Ross <cross+freebsd_at_distal.com>
Date: Mon, 15 Nov 2021 03:26:13 UTC

> On Nov 12, 2021, at 23:15, Chris Ross <cross+freebsd@distal.com> wrote:
> 
> I’ve built a stable/13 as of today, and updated the system.  I’ll see
> If the problem recurs, it usually takes about 24 hours to show.  If
> It does, I’ll see if I can run a procstat -kka and get it off of the system.

Happy Sunday, all.  So, I logged in this evening 48 hours after starting the
job that uses a lot of CPU and I/O to the ZFS pool.  The system seemed to
be working, and I was thought stable/13 just fixed it.  But, after only a few
minutes of fooling around it started to show problems.  Ssh connection
hung, and new ones couldn’t be made.  Then they could, but the shell got
stuck in disk wait once, and others worked.  Very odd.  I logged into the
console and ran a procstat -kka.  Then, I tried to ls -f a directory in the
large ZFS fs (/tank), which hung.  Ctrl-T on that shows:

load: 0.04. cmd: ls 87050 [aw.aew_cv] 41.13r 0.00u 0.00s 0% 2632k
mi_switch+0xc1 _cv_wait+0xf2 arc_wait_for_eviction+0x1df arc_get_data_impl+0x85 arc_hdr_alloc_abd+0x7b arc_read+0x6f7 dbuf_read+0xc5b dmu_buf_hold+0x46 zap_cursor_retrieve+0x163 zfs_freebsd_readdir+0x393 VOP_READDIR_APV+0x1f kern_getdirentries+0x1d9 sys_getdirentries+0x29 amd64_syscall+0x10c fast_syscall_common+0xf8

A procstat -kka output is available (208kb of text, 1441 lines) at
https://pastebin.com/SvDcvRvb

An ssh of a top command completed and shows:

last pid: 91551;  load averages:  0.00,  0.02,  0.30  up 2+00:19:33    22:23:15
40 processes:  1 running, 38 sleeping, 1 zombie
CPU:  3.9% user,  0.0% nice,  0.9% system,  0.0% interrupt, 95.2% idle
Mem: 58G Active, 210M Inact, 1989M Laundry, 52G Wired, 1427M Buf, 12G Free
ARC: 48G Total, 10G MFU, 38G MRU, 128K Anon, 106M Header, 23M Other
     46G Compressed, 46G Uncompressed, 1.00:1 Ratio
Swap: 425G Total, 3487M Used, 422G Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
90996 root          1  22    0    21M  9368K select  22   0:00   0.10% sshd
89398 cross        23  52    0    97G    60G uwait    4  94.1H   0.00% python3.
55463 cross        18  20    0   301M    54M kqread  31   4:30   0.00% python3.
54338 cross         4  20    0    82M  9632K kqread  33   1:02   0.00% python3.
84083 ntpd          1  20    0    21M  1712K select  33   0:07   0.00% ntpd

I’d love to hear any thoughts.  Again, this is running a 13-stable stable/13-n248044-4a36455c417.

Thanks all.

                   - Chris