From nobody Fri Nov 12 16:15:31 2021 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 2C46A18590B7 for ; Fri, 12 Nov 2021 16:15:44 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-ua1-x931.google.com (mail-ua1-x931.google.com [IPv6:2607:f8b0:4864:20::931]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4HrNww0Qxwz3J3k for ; Fri, 12 Nov 2021 16:15:43 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-ua1-x931.google.com with SMTP id i6so19849353uae.6 for ; Fri, 12 Nov 2021 08:15:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=WBFQh2MX1xLIuAxAi8aaOr7HN/NsuFiJzLx97B8O1GU=; b=6BykbJZSQqZ1p/W2UnzVRDk9QcLFumuINeukBM3AlF/+59lJPgodK9IVt49Aw7mJWa aVwaTScevSfntn2yg9TJHR/TJg11uI96YLJVEeNVqUOeUeVOpqfjomiIOvW7zenZJkmG Q0o2Xdm2dfFMToyxNyTpZ1NwYT4DgSrdZbzdwTt7w35GgV669Fsls5KRtjCQydSQQ0LI V+SHxslDGeBG04Ioqm2n5KNFirpGFUqooH7kBY3/b7QZOKl2GYzIsEmOCNbCFx2emg8b 6dbCcDJtOksTr9hx/2l2/OFK3k+8Ud/opBlYwasa4dlR59R5mEeZWeAkrSwThigoIc+D ZEHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=WBFQh2MX1xLIuAxAi8aaOr7HN/NsuFiJzLx97B8O1GU=; b=hjj6UphkqY0UdJqmZJ24VLF6XEa4zXvFHLrejIGHLn8L2kZQ6/eb1pl3Is6FASgbiJ ByE6HF+C6Y6kGM98noeyO8o8GP7pXBuniOKS/1ceCCtVoEkq+EoeyE8RW8eGHDL/dF/q uEHjer7Ur9kaSZkaH+9Ja7dvY1KvmyEcHV7lRjuOxicVi4RvDQTR5P7WtprtubxHEhXa iwtWEVUuU6RcnD9b7/ukAnJGXbdLk3tAtxwCo8Dh1JlTnEQk/DaxH526sCFrb1wjJk1d jXRR/0N58sw2T+LdfXxlboM/jxxAYGxPHu9XAkt5t8CLefkTsaNdyBkmqMh46lhqdkCK w9yQ== X-Gm-Message-State: AOAM5319QEE8pDWbrHCCh8nIST8G/Z/cMkPdQT1HOwV7tRXAqpk5kxCm AsMRbdSa6UQMFdSFwFjM6i3b5tpsLHzAVwYiP6EUnu+F2nY= X-Google-Smtp-Source: ABdhPJyelRmk+NvnhnizdJE/llgsp719FvHGmQEtrCJr4KIAmuaKgnTL+mpOJIcsKk1UjGd1VFDWOoR+8g6Sme5F01w= X-Received: by 2002:a05:6102:5f2:: with SMTP id w18mr12045951vsf.6.1636733743353; Fri, 12 Nov 2021 08:15:43 -0800 (PST) List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org MIME-Version: 1.0 References: <9FE99EEF-37C5-43D1-AC9D-17F3EDA19606@distal.com> <09989390-FED9-45A6-A866-4605D3766DFE@distal.com> <4E5511DF-B163-4928-9CC3-22755683999E@distal.com> <42006135.15.1636709757975@mailrelay> <7B41B7D7-0C74-4F87-A49C-A666DB970CC3@distal.com> In-Reply-To: <7B41B7D7-0C74-4F87-A49C-A666DB970CC3@distal.com> From: Warner Losh Date: Fri, 12 Nov 2021 09:15:31 -0700 Message-ID: Subject: Re: swap_pager: cannot allocate bio To: Chris Ross Cc: Ronald Klop , freebsd-fs Content-Type: multipart/alternative; boundary="0000000000005d6b1905d099c5e3" X-Rspamd-Queue-Id: 4HrNww0Qxwz3J3k X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; TAGGED_RCPT(0.00)[freebsd]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: Y --0000000000005d6b1905d099c5e3 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Nov 12, 2021, 6:49 AM Chris Ross wrote: > > >> root@host:~ # screen > >> load: 0.07 cmd: csh 56116 [vmwait] 35.00r 0.00u 0.01s 0% 3984k > >> mi_switch+0xc1 _sleep+0x1cb vm_wait_doms+0xe2 vm_wait_domain+0x51 > vm_domain_alloc_fail+0x86 vm_page_alloc_domain_after+0x7e > uma_small_alloc+0x58 keg_alloc_slab+0xba zone_import+0xee > zone_alloc_item+0x6f malloc+0x5d sigacts_alloc+0x1c fork1+0x9fb > sys_fork+0x54 amd64_syscall+0x10c fast_syscall_common+0xf8 As before, ps > and even mount and df work here on console. But, a =E2=80=9Czpool status= tank=E2=80=9D > will hang as before. A Ctrl+D on it > > >> load: 0.00 cmd: zpool 62829 [aw.aew_cv] 37.89r 0.00u 0.00s 0% 6976k > >> mi_switch+0xc1 _cv_wait+0xf2 arc_wait_for_eviction+0x14a > arc_get_data_impl+0xdb arc_hdr_alloc_abd+0xa6 arc_hdr_alloc+0x11e > arc_read+0x4f4 dbuf_read+0xc08 dmu_buf_hold+0x46 zap_lookup_norm+0x35 > zap_contains+0x26 vdev_rebuild_get_stats+0xac vdev_config_generate+0x3e9 > vdev_config_generate+0x74f spa_config_generate+0x2a2 spa_open_common+0x25= c > spa_get_stats+0x4e zfs_ioc_pool_stats+0x22 > > > Hi, > > > > Interesting. The details of these stacktraces are unknown to me. But it > looks like it is waiting for available memory in both cases. What is the > memory usage of the system while all this is happening. Is it swapping a > lot? > > And what is the real setup of the disks? Are things like GELI used (not > that the stack shows that) or swap-on-zfs? > > It=E2=80=99s pretty simple. No GELI, just three 3-disk raidz=E2=80=99s. = And swap is a > partition on a physical (ish: hardware RAID1) disk, which is also where t= he > OS and everything other than the one large ZFS filesystem are. > > > And is there something else interesting in the logs than "swap_pager: > cannot allocate bio"? Maybe a reason why it can't allocate the bio. > > Not that I saw. A new execution of procstat -kk (started yesterday), as > well as a dmesg, both hang now. They seem to be stuck with the same > stack-trace as screen is. And the zpool status shows the same stack with > Ctrl-T as it has. Looking at the logs now, Since I rebooted the system 2= 4 > hours ago, there are no kernel logs after the failure that began yesterda= y > afternoon. Apparently, this is a reproducible problem, it takes a day or > less to get stuck. So, that=E2=80=99s valuable in a way. ;-) > > > I would not know a pointer on how to debug this except for checking > tools like iostat, vmstat, etc.. Of course running 13-STABLE can give an > interesting data point. > > So, tl;dr; no data from the most recent hang other than what the > stack-traces show. Not even the =E2=80=9Ccannot allocate bio=E2=80=9D I = saw two days ago > after increasing swap size. I can take a look at 13-STABLE, when I give > up on this and reboot (likely today) I=E2=80=99ll try building that. > So the root cause of this problem is well known. You have a memory shortage, so you want to page out dirty pages to reclaim memory. However, there's not enough memory to allocate the structures you need to do I/O and so the swapout I/O fails half way down the stack not being able to allocate a bio. Some paths through the swapper cope with this well, other parts that execute less often cope less well. There's some hacks in the tree today to help with the GELI case: we prioritize swapping I/O. But there's no g_alloc_bio_swapping() interface for swapping I/O to get priority on allocating a bio to start with. Places that use g_clone_bio() could have the clone's copy allocated from a special swap pool, but that starts to get messy and isn't done today. And the upper layers like geom_cfs and ZFS are inconsistent in allocations, so there's work needed to make it robust in ZFS, but I have only a vague notion of what's needed. At the very least, the swapping I/O that comes into the top of ZFS won't have swapping I/O marked coming out the bottom because the BIO_SWAP flag is quite new. So until then, swapping on zvols is fraught with deadlocks like this and in the past there's been a strong admonishment against it. Warner - Chris > > > > --0000000000005d6b1905d099c5e3--