From nobody Fri Nov 12 13:49:17 2021 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 6A47F1850E3C for ; Fri, 12 Nov 2021 13:49:28 +0000 (UTC) (envelope-from cross+freebsd@distal.com) Received: from relay.wiredblade.com (relay.wiredblade.com [168.235.95.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4HrKh818Qtz3NHW for ; Fri, 12 Nov 2021 13:49:28 +0000 (UTC) (envelope-from cross+freebsd@distal.com) Received: from mail.distal.com (pool-108-48-165-176.washdc.fios.verizon.net [108.48.165.176]) by relay.wiredblade.com with ESMTPSA (version=TLSv1.2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256) ; Fri, 12 Nov 2021 13:49:21 +0000 Received: from smtpclient.apple ( [2001:420:c0c4:1004::ab]) by tristain.distal.com (OpenSMTPD) with ESMTPSA id 0e9ebacb (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256:NO); Fri, 12 Nov 2021 08:49:19 -0500 (EST) Content-Type: text/plain; charset=utf-8 List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 15.0 \(3693.20.0.1.32\)) Subject: Re: swap_pager: cannot allocate bio From: Chris Ross X-Priority: 3 (Normal) In-Reply-To: <42006135.15.1636709757975@mailrelay> Date: Fri, 12 Nov 2021 08:49:17 -0500 Cc: freebsd-fs Content-Transfer-Encoding: quoted-printable Message-Id: <7B41B7D7-0C74-4F87-A49C-A666DB970CC3@distal.com> References: <9FE99EEF-37C5-43D1-AC9D-17F3EDA19606@distal.com> <09989390-FED9-45A6-A866-4605D3766DFE@distal.com> <4E5511DF-B163-4928-9CC3-22755683999E@distal.com> <42006135.15.1636709757975@mailrelay> To: ronald-lists@klop.ws X-Mailer: Apple Mail (2.3693.20.0.1.32) X-Rspamd-Queue-Id: 4HrKh818Qtz3NHW X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; TAGGED_FROM(0.00)[freebsd]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N >> root@host:~ # screen >> load: 0.07 cmd: csh 56116 [vmwait] 35.00r 0.00u 0.01s 0% 3984k >> mi_switch+0xc1 _sleep+0x1cb vm_wait_doms+0xe2 vm_wait_domain+0x51 = vm_domain_alloc_fail+0x86 vm_page_alloc_domain_after+0x7e = uma_small_alloc+0x58 keg_alloc_slab+0xba zone_import+0xee = zone_alloc_item+0x6f malloc+0x5d sigacts_alloc+0x1c fork1+0x9fb = sys_fork+0x54 amd64_syscall+0x10c fast_syscall_common+0xf8 As before, = ps and even mount and df work here on console. But, a =E2=80=9Czpool = status tank=E2=80=9D will hang as before. A Ctrl+D on it >> load: 0.00 cmd: zpool 62829 [aw.aew_cv] 37.89r 0.00u 0.00s 0% 6976k >> mi_switch+0xc1 _cv_wait+0xf2 arc_wait_for_eviction+0x14a = arc_get_data_impl+0xdb arc_hdr_alloc_abd+0xa6 arc_hdr_alloc+0x11e = arc_read+0x4f4 dbuf_read+0xc08 dmu_buf_hold+0x46 zap_lookup_norm+0x35 = zap_contains+0x26 vdev_rebuild_get_stats+0xac vdev_config_generate+0x3e9 = vdev_config_generate+0x74f spa_config_generate+0x2a2 = spa_open_common+0x25c spa_get_stats+0x4e zfs_ioc_pool_stats+0x22 > Hi, >=20 > Interesting. The details of these stacktraces are unknown to me. But = it looks like it is waiting for available memory in both cases. What is = the memory usage of the system while all this is happening. Is it = swapping a lot? > And what is the real setup of the disks? Are things like GELI used = (not that the stack shows that) or swap-on-zfs? It=E2=80=99s pretty simple. No GELI, just three 3-disk raidz=E2=80=99s. = And swap is a partition on a physical (ish: hardware RAID1) disk, which = is also where the OS and everything other than the one large ZFS = filesystem are. > And is there something else interesting in the logs than "swap_pager: = cannot allocate bio"? Maybe a reason why it can't allocate the bio. Not that I saw. A new execution of procstat -kk (started yesterday), as = well as a dmesg, both hang now. They seem to be stuck with the same = stack-trace as screen is. And the zpool status shows the same stack = with Ctrl-T as it has. Looking at the logs now, Since I rebooted the = system 24 hours ago, there are no kernel logs after the failure that = began yesterday afternoon. Apparently, this is a reproducible problem, = it takes a day or less to get stuck. So, that=E2=80=99s valuable in a = way. ;-) =20 > I would not know a pointer on how to debug this except for checking = tools like iostat, vmstat, etc.. Of course running 13-STABLE can give an = interesting data point. So, tl;dr; no data from the most recent hang other than what the = stack-traces show. Not even the =E2=80=9Ccannot allocate bio=E2=80=9D I = saw two days ago after increasing swap size. I can take a look at = 13-STABLE, when I give up on this and reboot (likely today) I=E2=80=99ll = try building that. - Chris