From nobody Mon Sep 04 06:35:52 2023 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RfJn85kKdz4rh0W for ; Mon, 4 Sep 2023 06:36:12 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic305-19.consmr.mail.gq1.yahoo.com (sonic305-19.consmr.mail.gq1.yahoo.com [98.137.64.82]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4RfJn82fRQz3NyW for ; Mon, 4 Sep 2023 06:36:12 +0000 (UTC) (envelope-from marklmi@yahoo.com) Authentication-Results: mx1.freebsd.org; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1693809370; bh=VNkvZ/r0lhw93jplQBvi/VH0Q23hejTNK+T2EGth9e8=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From:Subject:Reply-To; b=OI0M4nb5ooeqo25153D60q3U9gU275GV32nzBk+GW5xigdvkRDQV/NK43oI4ErTY1H+TZWmfq3X6x0L0jk+e9u4W+6vP+rIfHf+hywOb58VYKS5vlIG9QNrCRI5t889Yl6Nlc2PIQMkLwP6ZU1U9T+hpPGRjnWXfYgN/Ggkr8ik5SfqzNZZzhNCv+phaQWhTRRHiXqd7fSvoxXoFNx0AzuT8VQK3OH8Nk+mB8cri0LX8dCXMn027Be0C5vo5aOR7m35EZXFSiPHS9lNZsyKZL9Wy8rjy3qFxdPDDkT7DVxr0lNP+SBze3neJ2N+i5530EeUPmznGrJtFHYlil69Q+A== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1693809370; bh=235MDa8io5pUS8+XrjDPgLvr4Mdkv/pPy960EFPPYUn=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=M0J3VsqdCs2BSEJqKC/VleQteBKPlctQRICLcpLnsf/FDM0jwLPVWKc23OcKE4TWPi7myUAZYIXn1MHb9+3DEF+RKvsPm5lKJu4nXWUvzkf32qabU2yd1RlGCP7/TGxazy5rBpUoIMYnTRcrTnk0nRrzOQTKvWX/SCNFM34JnCw+BbuAWVyZygqAc6FUOo7s71T1V9I07kf0pshZB6F08ZpQQphqw5wU2ulgYJzkctB+EkwurZM7lSkdQMOvlqpEhc0TTsIsoPKjWeipOMIUpmGwbW2jqXXYPrQOz0t8NE56wvHsYBq/oM0EtGY77bsizjEZ83UP4DCQfJm0+clt0A== X-YMail-OSG: 3i9CIN8VM1lr6m7n74XvFTosvZoV5.9ZEXLNtOxJMk4RrZR9Mdc7Nsx0oCjZdoF Bn9v8f6itXpJ5dgmxNOts_lJ_hhbCUm6JvR.tvi.ga_dS86fOiKKxxNf0vSHWZrYoH5BfkdnDb3t 26IlXfFvJ6jLbbRPuekqw7G55FH5hc5e30lqDY3Ekgd8658pGWCFD.PCO5jE9ptPwggAzLxCv_a4 USVt93hgqZvLzz7bB315Nl4tpfyhKOFC.LY1wt38WhQwa9TesBmc4LW3npGwxQdy3TywShWunwQ7 fdzI93eQ1ZOO0K97rSEUItOtTalHSm4aI4DNtsVoC8QMIIykF2j185HfqNNHlt9.hh7F_ihWgdhK 8Ry7hvzLrUziRP0rVfKTNBEyKlwYOUnRFRHc0ljbk4FXAOQ7qqbH9cd_SC9.NH19wtf.bg6nAceL eewOsYf9mG4WnALkmEvUsVsY.d4_OLE2gyxbV296P4HdroUdO6y_6lawRuhOu4ys7DjqDstm8aFQ bNzWrqXDKOTAXVHrhAW5xeaXYNh9FztZp8y3b9_RHIvvROhjfce3sQjyA6fKMvEXkQBVO1zUBHos mG1MaEOiFMoyCtkeVZO3Bdq_5HRg6dT0.lepPwm3_wLkUyv0JdRznpn2_cUJ9al3qIR_L1lE74FQ 3vCUPb4tH2S2kAk4rgPhjAY075BafRy7GbTXWSM_2e0vM1YD_PchpszCs9pSP5D8IYCj_gFW6CH2 Ylb35tk6KZfJV5hw1igf0PZWh876df40RFEgs4FW1eqFRr8dIRAMjJkW9OkwxOMzHpf5b1bXxhIS MbYFdcFsBKoLuUWHHDESqpwypHXJl05zR3Vn1sTmI.5S0nULQRbTL1sHmIwKYpfXkQUp3.Xr51ij JGzK3UCKfdP6Rg6mmTIlCWIjJWkxfczVKBRqxMsHt3HCFDEObd1LVLJFbrsgpySUPRWPfNN8hQyE 7r32RIEuKGAD8Ib8lRdCzme_jjJsIA53FrdE91wZvuQuyf5gOuqSE23.CBuIsvuIrsC1C07qYCaH sjp67aUotrV4dN0vsY02aIUcVXWlDXfpoFg.z_OBWsL0PoJG_BzMIjI8TlXSr9vKaDrqtm9Np2gW 0xLhjoY2H0o5moBMSSBHBJRFN1lR4Y0vXYt9wp8cNICfI3e6o08Nvsm7DPv9seBR.KVNEzCbqp_T mN_93OpHDLkvOtMDt.1jewqWbttxNGcBa7kX6Z5RnfXF6MBr0RdiScC5DrtHU6bDa5C5Fwva7oan 8pzYBmTNwCx3QjUf6x1dF2oL7OYg58DUmosKl.TSLoFMN_1.QdeNtuKDqJtnPefapmaKY7DXDQ.a QSPXgIVpOBeF.X_H8FCzERQq.rk2q8T5XbFBHfTQjnOgUWGj3mWT7lorq3pMX6_9EMdRP4cbp5t6 N2B5K4dS_LDPHVsJRiGD2IXECeGpEZXXLbjRAw1WALthsCkYm_OJZAPenQUiaZ783dRCEdtSzMZH jOzQASBdKc__CY4_wneKZeo_0Hzy_0_pSijLS_JXZ4tufvZgNQGr6TMqfKiNGiTkho3oMsJd9Lns 1Bh5ZoKBJe2_oodNxMr4bXR.spvTWPsvlKnjvsvlNPtdiHD1JaYevVtJkaMu9atVNlKDug594YoW M8R0HtEeB7O1UXLP4_KfyoWF8wWupnYyEBi2lycMUEt6Brret9tJjOq5NO8l7P2wuEwN48edrEkr Pes1VYrbNfqjkH6vZuWJk4c_9mSUs.ioWe0xct_AeBhyUlA762zahY.W5UReDLl8kje_LYw.5_IZ PuEpUKA1yW1G4yp4B_rqmJSoaVKmS8cdYPHGvI_4shbv7XaLulrgD0qqNVEnmlK8oPtXDgKxkiD. sf4W2iNrmbCyH5a.C.rqwcia9_TxkBMI.4MVUhxwU_HFnlNz267fCMRD6SFxHBZ0mBoes3bG7E0O ggQySyTFDtSdFfvqoGdj0foXE4SDiToZFuZxlxE5fEp36CJp8asHAbzow.SUqjRQoGC4nQawUeLc RUiKAqFdtDhbCsHelEQBCfJ3Fi2zlRkEUJpSqZbqmFKAjitB82YN5SNP.C9DkRsPnsuE0u.U5t_p IQ5RABrC2AeCfaFmyhRntPeXyYU.nog_3xLL01v.IpqID7hCBUEm88nI7q4aI8QKpV7R9I7.IorR t9yrfoZhSx2sguOjxmTqquTRAJkLJmLPgA2MCF2zRBoynGGZhtAr5ypU6kw9Ssjn1PaOF1iB3TLS GW72II1NIBvnGDfEO9GRULhsFbTvoPfvlBZvHB6WueXfbEO7kyXlipPua4RUuujmo2OxlvpiqC1h P X-Sonic-MF: X-Sonic-ID: aaa3356b-021a-40f4-adaa-dc309dc9fd3c Received: from sonic.gate.mail.ne1.yahoo.com by sonic305.consmr.mail.gq1.yahoo.com with HTTP; Mon, 4 Sep 2023 06:36:10 +0000 Received: by hermes--production-bf1-865889d799-sjjww (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID 627bc217dc55385fbff2203bcadfcee0; Mon, 04 Sep 2023 06:36:04 +0000 (UTC) Content-Type: text/plain; charset=us-ascii List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.700.6\)) Subject: Re: An attempted test of main's "git: 2ad756a6bbb3" "merge openzfs/zfs@95f71c019" that did not go as planned From: Mark Millard In-Reply-To: Date: Sun, 3 Sep 2023 23:35:52 -0700 Cc: dev-commits-src-main@freebsd.org, Current FreeBSD Content-Transfer-Encoding: quoted-printable Message-Id: <673A446E-6F94-451E-910F-079F678C5289@yahoo.com> References: To: Alexander Motin X-Mailer: Apple Mail (2.3731.700.6) X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US] X-Rspamd-Queue-Id: 4RfJn82fRQz3NyW On Sep 3, 2023, at 22:06, Alexander Motin wrote: >=20 > On 03.09.2023 22:54, Mark Millard wrote: >> After that ^t produced the likes of: >> load: 6.39 cmd: sh 4849 [tx->tx_quiesce_done_cv] 10047.33r 0.51u = 121.32s 1% 13004k >=20 > So the full state is not "tx->tx", but is actually a = "tx->tx_quiesce_done_cv", which means the thread is waiting for new = transaction to be opened, which means some previous to be quiesced and = then synced. >=20 >> #0 0xffffffff80b6f103 at mi_switch+0x173 >> #1 0xffffffff80bc0f24 at sleepq_switch+0x104 >> #2 0xffffffff80aec4c5 at _cv_wait+0x165 >> #3 0xffffffff82aba365 at txg_wait_open+0xf5 >> #4 0xffffffff82a11b81 at dmu_free_long_range+0x151 >=20 > Here it seems like transaction commit is waited due to large amount of = delete operations, which ZFS tries to spread between separate TXGs. That fit the context: cleaning out /usr/local/poudriere/data/.m/ > You should probably see some large and growing number in sysctl = kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay . After the reboot I started a -J64 example. It has avoided the early "witness exhausted". Again I ^C'd after about an hours after the 2nd builder had started. So: again cleaning out /usr/local/poudriere/data/.m/ Only seconds between: # sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 276042 # sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 276427 # sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 277323 # sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 278027 I have found a measure of progress: zfs list's USED for /usr/local/poudriere/data/.m is decreasing. So ztop's d/s was a good classification: deletes. >> #5 0xffffffff829a87d2 at zfs_rmnode+0x72 >> #6 0xffffffff829b658d at zfs_freebsd_reclaim+0x3d >> #7 0xffffffff8113a495 at VOP_RECLAIM_APV+0x35 >> #8 0xffffffff80c5a7d9 at vgonel+0x3a9 >> #9 0xffffffff80c5af7f at vrecycle+0x3f >> #10 0xffffffff829b643e at zfs_freebsd_inactive+0x4e >> #11 0xffffffff80c598cf at vinactivef+0xbf >> #12 0xffffffff80c590da at vput_final+0x2aa >> #13 0xffffffff80c68886 at kern_funlinkat+0x2f6 >> #14 0xffffffff80c68588 at sys_unlink+0x28 >> #15 0xffffffff8106323f at amd64_syscall+0x14f >> #16 0xffffffff8103512b at fast_syscall_common+0xf8 >=20 > What we don't see here is what quiesce and sync threads of the pool = are actually doing. Sync thread has plenty of different jobs, including = async write, async destroy, scrub and others, that all may delay each = other. >=20 > Before you rebooted the system, depending how alive it is, could you = save a number of outputs of `procstat -akk`, or at least specifically = `procstat -akk | grep txg_thread_enter` if the full is hard? Or somehow = else observe what they are doing. # procstat -akk > ~/mmjnk00.txt # procstat -akk > ~/mmjnk01.txt # procstat -akk > ~/mmjnk02.txt # procstat -akk > ~/mmjnk03.txt # procstat -akk > ~/mmjnk04.txt # procstat -akk > ~/mmjnk05.txt # grep txg_thread_enter ~/mmjnk0[0-5].txt /usr/home/root/mmjnk00.txt: 6 100881 zfskern = txg_thread_enter mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 = txg_thread_wait+0xeb txg_quiesce_thread+0x144 fork_exit+0x82 = fork_trampoline+0xe=20 /usr/home/root/mmjnk00.txt: 6 100882 zfskern = txg_thread_enter mi_switch+0x173 sleepq_switch+0x104 = sleepq_timedwait+0x4b _cv_timedwait_sbt+0x188 zio_wait+0x3c9 = dsl_pool_sync+0x139 spa_sync+0xc68 txg_sync_thread+0x2eb fork_exit+0x82 = fork_trampoline+0xe=20 /usr/home/root/mmjnk01.txt: 6 100881 zfskern = txg_thread_enter mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 = txg_thread_wait+0xeb txg_quiesce_thread+0x144 fork_exit+0x82 = fork_trampoline+0xe=20 /usr/home/root/mmjnk01.txt: 6 100882 zfskern = txg_thread_enter mi_switch+0x173 sleepq_switch+0x104 = sleepq_timedwait+0x4b _cv_timedwait_sbt+0x188 zio_wait+0x3c9 = dsl_pool_sync+0x139 spa_sync+0xc68 txg_sync_thread+0x2eb fork_exit+0x82 = fork_trampoline+0xe=20 /usr/home/root/mmjnk02.txt: 6 100881 zfskern = txg_thread_enter mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 = txg_thread_wait+0xeb txg_quiesce_thread+0x144 fork_exit+0x82 = fork_trampoline+0xe=20 /usr/home/root/mmjnk02.txt: 6 100882 zfskern = txg_thread_enter mi_switch+0x173 sleepq_switch+0x104 = sleepq_timedwait+0x4b _cv_timedwait_sbt+0x188 zio_wait+0x3c9 = dsl_pool_sync+0x139 spa_sync+0xc68 txg_sync_thread+0x2eb fork_exit+0x82 = fork_trampoline+0xe=20 /usr/home/root/mmjnk03.txt: 6 100881 zfskern = txg_thread_enter mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 = txg_thread_wait+0xeb txg_quiesce_thread+0x144 fork_exit+0x82 = fork_trampoline+0xe=20 /usr/home/root/mmjnk03.txt: 6 100882 zfskern = txg_thread_enter mi_switch+0x173 sleepq_switch+0x104 = sleepq_timedwait+0x4b _cv_timedwait_sbt+0x188 zio_wait+0x3c9 = dsl_pool_sync+0x139 spa_sync+0xc68 txg_sync_thread+0x2eb fork_exit+0x82 = fork_trampoline+0xe=20 /usr/home/root/mmjnk04.txt: 6 100881 zfskern = txg_thread_enter mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 = txg_thread_wait+0xeb txg_quiesce_thread+0x144 fork_exit+0x82 = fork_trampoline+0xe=20 /usr/home/root/mmjnk04.txt: 6 100882 zfskern = txg_thread_enter mi_switch+0x173 sleepq_switch+0x104 = sleepq_timedwait+0x4b _cv_timedwait_sbt+0x188 zio_wait+0x3c9 = dsl_pool_sync+0x139 spa_sync+0xc68 txg_sync_thread+0x2eb fork_exit+0x82 = fork_trampoline+0xe=20 /usr/home/root/mmjnk05.txt: 6 100881 zfskern = txg_thread_enter mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 = txg_thread_wait+0xeb txg_quiesce_thread+0x144 fork_exit+0x82 = fork_trampoline+0xe=20 /usr/home/root/mmjnk05.txt: 6 100882 zfskern = txg_thread_enter mi_switch+0x173 sleepq_switch+0x104 = sleepq_timedwait+0x4b _cv_timedwait_sbt+0x188 zio_wait+0x3c9 = dsl_pool_sync+0x139 spa_sync+0xc68 txg_sync_thread+0x2eb fork_exit+0x82 = fork_trampoline+0xe=20 (Hopefully that will be a sufficiently useful start.) > `zpool status`, `zpool get all` and `sysctl -a` would also not harm. It is a very simple zpool configuration: one partition. I only use it for bectl BE reasons, not the general range of reasons for using zfs. I created the media with my normal content, then checkpointed before doing the git fetch to start to set up the experiment. # zpool status pool: zamd64 state: ONLINE status: Some supported and requested features are not enabled on the = pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not = support the features. See zpool-features(7) for details. checkpoint: created Sun Sep 3 11:46:54 2023, consumes 2.17M config: NAME STATE READ WRITE CKSUM zamd64 ONLINE 0 0 0 gpt/amd64zfs ONLINE 0 0 0 errors: No known data errors There was also a snapshot in place before I did the checkpoint operation. I deliberately did not use my typical openzfs-2.1-freebsd=20 for compatibility but used defaults when creating the pool: # zpool get all NAME PROPERTY VALUE = SOURCE zamd64 size 872G - zamd64 capacity 21% - zamd64 altroot - = default zamd64 health ONLINE - zamd64 guid 4817074778276814820 - zamd64 version - = default zamd64 bootfs zamd64/ROOT/main-amd64 = local zamd64 delegation on = default zamd64 autoreplace off = default zamd64 cachefile - = default zamd64 failmode wait = default zamd64 listsnapshots off = default zamd64 autoexpand off = default zamd64 dedupratio 1.00x - zamd64 free 688G - zamd64 allocated 184G - zamd64 readonly off - zamd64 ashift 0 = default zamd64 comment - = default zamd64 expandsize - - zamd64 freeing 0 - zamd64 fragmentation 17% - zamd64 leaked 0 - zamd64 multihost off = default zamd64 checkpoint 2.17M - zamd64 load_guid 17719601284614388220 - zamd64 autotrim off = default zamd64 compatibility off = default zamd64 bcloneused 0 - zamd64 bclonesaved 0 - zamd64 bcloneratio 1.00x - zamd64 feature@async_destroy enabled = local zamd64 feature@empty_bpobj active = local zamd64 feature@lz4_compress active = local zamd64 feature@multi_vdev_crash_dump enabled = local zamd64 feature@spacemap_histogram active = local zamd64 feature@enabled_txg active = local zamd64 feature@hole_birth active = local zamd64 feature@extensible_dataset active = local zamd64 feature@embedded_data active = local zamd64 feature@bookmarks enabled = local zamd64 feature@filesystem_limits enabled = local zamd64 feature@large_blocks enabled = local zamd64 feature@large_dnode enabled = local zamd64 feature@sha512 enabled = local zamd64 feature@skein enabled = local zamd64 feature@edonr enabled = local zamd64 feature@userobj_accounting active = local zamd64 feature@encryption enabled = local zamd64 feature@project_quota active = local zamd64 feature@device_removal enabled = local zamd64 feature@obsolete_counts enabled = local zamd64 feature@zpool_checkpoint active = local zamd64 feature@spacemap_v2 active = local zamd64 feature@allocation_classes enabled = local zamd64 feature@resilver_defer enabled = local zamd64 feature@bookmark_v2 enabled = local zamd64 feature@redaction_bookmarks enabled = local zamd64 feature@redacted_datasets enabled = local zamd64 feature@bookmark_written enabled = local zamd64 feature@log_spacemap active = local zamd64 feature@livelist enabled = local zamd64 feature@device_rebuild enabled = local zamd64 feature@zstd_compress enabled = local zamd64 feature@draid enabled = local zamd64 feature@zilsaxattr active = local zamd64 feature@head_errlog active = local zamd64 feature@blake3 enabled = local zamd64 feature@block_cloning enabled = local zamd64 feature@vdev_zaps_v2 active = local zamd64 feature@redaction_list_spill disabled = local /etc/sysctl.conf does have: vfs.zfs.min_auto_ashift=3D12 vfs.zfs.per_txg_dirty_frees_percent=3D5 The vfs.zfs.per_txg_dirty_frees_percent is from prior Mateusz Guzik help, where after testing the change I reported: Result summary: Seems to have avoided the sustained periods of low load average activity. Much better for the context. But it was for a different machine (aarch64, 8 cores). But it was for poudriere bulk use. Turns out the default of 30 was causing sort of like what is seen here: I could have presented some of the information via the small load average figures here. (Note: 5 is the old default, 30 is newer. Other contexts have other problems with 5: no single right setting and no automated configuration.) Other than those 2 items, zfs is untuned (defaults). sysctl -a is a lot more output (864930 Bytes) so I'll skip it for now. > PS: I may be wrong, but USB in "USB3 NVMe SSD storage" makes me = shiver. Make sure there is no storage problems, like some huge delays, = timeouts, etc, that can be seen, for example, as busy percents regularly = spiking far above 100% in your `gstat -spod`. >=20 The "gstat -spod" output showed (and shows): around 0.8ms/w to 3ms/w, mostly at the lower end of the range. < 98%busy, no spikes to > 100%. It is a previously unused Samsung PSSD T7 Touch. I was not prepared to replace the content of a PCIe slot's media or M.2 connection's media for the temporary purpose. No spare supply for those so no simple swapping for those. =3D=3D=3D Mark Millard marklmi at yahoo.com