From nobody Mon Sep 04 15:45:38 2023
X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RfXzN6GZSz4sP8m
	for <freebsd-current@mlmmj.nyi.freebsd.org>; Mon,  4 Sep 2023 15:45:52 +0000 (UTC)
	(envelope-from marklmi@yahoo.com)
Received: from sonic315-54.consmr.mail.gq1.yahoo.com (sonic315-54.consmr.mail.gq1.yahoo.com [98.137.65.30])
	(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
	(Client did not present a certificate)
	by mx1.freebsd.org (Postfix) with ESMTPS id 4RfXzM6hRYz3Ynx
	for <freebsd-current@freebsd.org>; Mon,  4 Sep 2023 15:45:51 +0000 (UTC)
	(envelope-from marklmi@yahoo.com)
Authentication-Results: mx1.freebsd.org;
	none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1693842349; bh=w0kXlOBScK6IxOmuwDZGWF0Fp5BAK7cRabQYJvIVhNg=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From:Subject:Reply-To; b=gm4uv5F2cDSfSiNsThDqWNkmcXd+SV6xyJodkWCUOdSaTQNuhAaKT/zJIu+ULr0kLPCHcBB8XBTtpsNFQ03s0g0+WoqJ8r8pZ7ICFzsqnrJ690cc2F22Ne/gSSDVEjUbYO0RQkv2xF3R39scqBKrmf6yuoJKcD7edmkMfT90BASEP0hytEvBvdq3G0aMvy/KThH7taQmJovXScUBqsxG3xJFX3SBfm4zfMnAvo2FOSYujhgDJLPExI7Rqf4r95FfxzMjXC50Q2wKyECKdofOsuimsaeaDE9O2NVklTZYNik1BPl7S8X7RjlvETzQW8Au+UtAIwC0mTWz4Ic9kECznA==
X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1693842349; bh=YrZB+zmOITd+1J6ixVdYnxPN5Vr81p8CuZp90cED+jO=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=J7v4TiXzI7qe0Kxi1v2LfnV4o5XM0PChPmS7qdS1Kl3STfMZdFCj8ictTHWGv+dZm5uTR/0y97rLeY0MsTCeG+Eq3goGx73+kztdmIBeXTjgZMtBiepMhhHsIC93meZlXDuxXPBMG+xVslBHlNmQbH0UwXJHLWzg6H4UdmWd2qsYJJGyCtp5Xahh08lXTWhxOyJFIbZX5WTl0BzawcDvUOt14HKS0KOq556UAuPJPfH4mjz8XMJ0MY8q5mHUW5ZRvJduIfJ/FIY6EsxViVZs+7+gCfmztAYS+cLdFDZhV15GlbDyWj9FcbCAIQzQhc4r5y522ReRQEeXyz7ksfPBHw==
X-YMail-OSG: 1yVd7y8VM1k.eii6OuvVmcRnq67PohhWiUNFL98IZzUjrPbOchxIfWsvzsim7R0
 9fq3LLZqZNP.665mBrXEINMmbWASrWaS6uT2nQtlyBtOOTwFIU8Y3X_q1NRemK2J7.3jyW.IjlZA
 ZphSr5okqE7NQ_xA4Ffkxc.r8A4Dat4LWMMF_ZXQtdiNlgmRWBYHLrFyTSsA7XNxn0Kw5YIVtMW1
 Vio.DwnPooc0a2.7VpX5jZpCAfcMls76iI.gP9dVvWBBxmcCnX5X3lrm2ba0pij_nARoRoWp2fqV
 TVTW305rwU0P4cKRmcI8kLQLvVuO6vQvpbG0ft3ol5IpBdOJVZ1B1b_4hyyoOCdZkTLrMdOaGh9U
 l1hZk4nYOxIJ4JdioLLWe.vrpDyfS71hHF9atJP_Bcl7anN_.l7gceHkh182i6zHqgAHt69ikGVM
 cXuUpXTzZCMjc5E7BfItGbTvvGiNoFbd0yxkQWUF_nbNPTs1cRY1yYJmf_TbEz8udabGhC9CDhw0
 CMJ8UxC6t9Df7O5N6FE1zSfVQoD2r74MD30YCgKF33zKukTJrulOVVIN.NX1bSX28Jr6f3ZtKJ8F
 ayyUB9Uwdx21Q2M5r3RtvDaJZqDn7rAsDhwama7f43w.4rO5zedX_B0J_XK8Q2Bk5Bcd8Lj8z3F3
 PlHqAMSqVAcOvjKDDbGcnLKkDukxm6ueGDOgwq9Fx3EHhW8aoiiA0N3yggzm4WVZmOHof9APOR_n
 rxFyIXHdevhySluc91Fuzl.nEQHN0OG0aUUBGjg6TRZggYmjRRJh1fAgmklrGkFeDbDbu1W4OzPz
 Rv9Xe5WU_v.RCrtK0c.nKLPQw4eXhS4cmKylSqw2FdtDDN.0fGXIU14.V7blnznA.S_c2.kHKLUB
 sMxRanWwN1YvwnhvROuP9VnkDM3D2wJODyNUDgTUIXKY_FKYZyHioazGrA.bpYbxYLg0rqCr0omn
 A.GJdtPil3ld2GO081xPSLR5td6b6XTcqaGcFKABO5_xIV9cxPqcIdWlarI82KMRymrM3dW4uv3W
 i7XO12Vi6X99qjAHVXHxPjiVYtuAkaeCNiDRBGcZsOymTwbS5_NvmkoRb3DvM0.zyN7XIYauPg4F
 nPC_B2r6Ufs2RHNoVZZ7QxMKWqXuOcIcBs4fMpzSGfT9.nccebRXtEtikJB1N1VealNfNmPX1BeO
 jokIb4iLDRy5LmNQYh9b6pLuE7YclnHGPSeRznxV7th5_vRFDjGfWzflzjMijVdwPF3JtKff5tI2
 XauMRw2f4Jcq.8dUGJXkJVEgZUHcB9GLrFj_X3m5XxcFFqGpMc0WG0Y4f1o3eINZClOoE7.xq1rb
 7hPEy9vTqmfZ_McpDoAEQzf1XeZqrNmHBM9tXh2_LcIPdRzmWv3y1oXP709J6e4.S6qsq5Z.TkvM
 0LRqSt2CjGz62J3p603.LIqeAUUTEzoHsHeRgwU2jILbxcVZLEIC.54W0r3V8DPnZjyAK.neJKg7
 QOLyHl6G252K0Pdc7N8ZLrhQdu1G5jnSZyflUv.7_0Fgk8nIEDu5HOKMFQofAfH9t8n9KkYmmy_d
 YrIrktW_CQRNOI6flR0MUDUT82C.Qj.skWqt.JV3YlqMnCXl6ilQbXguymzqbttgqHp08KEuCA2e
 pnpRL_NfYdPQwhfkmWLGa9tTLZf80FSb.GQQtbRiI.AVXiN1ZAB7NNIc0NBgOhg9rhCAI0AomPMg
 9LVSYmTHW7pFuMpZDaFl6LJO2Dx60CBbOC1fGOU5moTRs4lkyKgXcWzhZcMhWmMgeBlHtphEBxt3
 LV70aDC4QpjKhvTW7rBvrupgGkccUctXYFVnsCQAgIcIj9YmWVj6eVjESI_E3o7nBlZzrNrGYW1p
 KK4e0hPjJSnzHCkZAvY47UTwk3tePk1yBArGJZommi6MLyMsyOTlMeyi58BvRkz6hrsK5zkFyw9X
 FRLlXcBz2D8.x0iUW5ElCSDkwv1P.zE9dXcDwatPgtUi3aqHAO3p5KROU09LxC_4kn9luyywLj7N
 ZoTAnQ1Cvu2ESZb2Dyl_Gz95bkAVIqfj_EZJPStAy4yUSgT3fI9NGAamTu7AnR3ki4Lj.BOouEP1
 CHMTS9LO1glI4UpGXMTV0uXS4kHjzkcurF12ko9MIdA7GWzeASKQeGIGXy27m2PtEzTruXKS6IzN
 AS2xEGgxmMIp6n8ah6kknNUlrMl4N4Ps9cBGpnNoZTgOBldUVwnvaQ.1yazG4jQhxP6yA.C_RIf0
 f_2kG3MSsTy87DrDzQt5Kq1LuBYJPJsVJc05_7IhbyAhD.y3gsJmU6rbHV5jp3hF9PoSaPa5i_wI
 BXQ--
X-Sonic-MF: <marklmi@yahoo.com>
X-Sonic-ID: 9813e03b-5509-4f5c-90e4-827e9d646bec
Received: from sonic.gate.mail.ne1.yahoo.com by sonic315.consmr.mail.gq1.yahoo.com with HTTP; Mon, 4 Sep 2023 15:45:49 +0000
Received: by hermes--production-gq1-6b7c87dcf5-sv5pn (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID 490f1d8daee147dc9f9aaf4236cf86d1;
          Mon, 04 Sep 2023 15:45:48 +0000 (UTC)
Content-Type: text/plain;
	charset=us-ascii
List-Id: Discussions about the use of FreeBSD-current <freebsd-current.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-current
List-Help: <mailto:freebsd-current+help@freebsd.org>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Subscribe: <mailto:freebsd-current+subscribe@freebsd.org>
List-Unsubscribe: <mailto:freebsd-current+unsubscribe@freebsd.org>
Sender: owner-freebsd-current@freebsd.org
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.700.6\))
Subject: Re: An attempted test of main's "git: 2ad756a6bbb3" "merge
 openzfs/zfs@95f71c019" that did not go as planned
From: Mark Millard <marklmi@yahoo.com>
In-Reply-To: <69028684-c2cd-8f0c-617f-d0763c08dbe4@FreeBSD.org>
Date: Mon, 4 Sep 2023 08:45:38 -0700
Cc: dev-commits-src-main@freebsd.org,
 Current FreeBSD <freebsd-current@freebsd.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <8D832756-6754-4D1E-AE8D-E716FE08F747@yahoo.com>
References: <F20BD354-2E10-4CD9-88FB-99DB1E405A52.ref@yahoo.com>
 <F20BD354-2E10-4CD9-88FB-99DB1E405A52@yahoo.com>
 <c9c5096b-33d9-1cb6-d446-97b99f324fce@FreeBSD.org>
 <673A446E-6F94-451E-910F-079F678C5289@yahoo.com>
 <2BDD30B5-6248-4EC3-83C8-0499E0717D1D@yahoo.com>
 <C78808EB-FDE2-432D-8309-A7017DFC1BCE@yahoo.com>
 <69028684-c2cd-8f0c-617f-d0763c08dbe4@FreeBSD.org>
To: Alexander Motin <mav@FreeBSD.org>
X-Mailer: Apple Mail (2.3731.700.6)
X-Spamd-Bar: ----
X-Rspamd-Pre-Result: action=no action;
	module=replies;
	Message is reply to one we originated
X-Spamd-Result: default: False [-4.00 / 15.00];
	REPLY(-4.00)[];
	ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]
X-Rspamd-Queue-Id: 4RfXzM6hRYz3Ynx

On Sep 4, 2023, at 06:09, Alexander Motin <mav@FreeBSD.org> wrote:

> On 04.09.2023 05:56, Mark Millard wrote:
>> On Sep 4, 2023, at 02:00, Mark Millard <marklmi@yahoo.com> wrote:
>>> On Sep 3, 2023, at 23:35, Mark Millard <marklmi@yahoo.com> wrote:
>>>> On Sep 3, 2023, at 22:06, Alexander Motin <mav@FreeBSD.org> wrote:
>>>>> On 03.09.2023 22:54, Mark Millard wrote:
>>>>>> After that ^t produced the likes of:
>>>>>> load: 6.39  cmd: sh 4849 [tx->tx_quiesce_done_cv] 10047.33r 0.51u =
121.32s 1% 13004k
>>>>>=20
>>>>> So the full state is not "tx->tx", but is actually a =
"tx->tx_quiesce_done_cv", which means the thread is waiting for new =
transaction to be opened, which means some previous to be quiesced and =
then synced.
>>>>>=20
>>>>>> #0 0xffffffff80b6f103 at mi_switch+0x173
>>>>>> #1 0xffffffff80bc0f24 at sleepq_switch+0x104
>>>>>> #2 0xffffffff80aec4c5 at _cv_wait+0x165
>>>>>> #3 0xffffffff82aba365 at txg_wait_open+0xf5
>>>>>> #4 0xffffffff82a11b81 at dmu_free_long_range+0x151
>>>>>=20
>>>>> Here it seems like transaction commit is waited due to large =
amount of delete operations, which ZFS tries to spread between separate =
TXGs.
>>>>=20
>>>> That fit the context: cleaning out /usr/local/poudriere/data/.m/
>>>>=20
>>>>> You should probably see some large and growing number in sysctl =
kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay .
>>>>=20
>>>> After the reboot I started a -J64 example. It has avoided the
>>>> early "witness exhausted". Again I ^C'd after about an hours
>>>> after the 2nd builder had started. So: again cleaning out
>>>> /usr/local/poudriere/data/.m/ Only seconds between:
>>>>=20
>>>> # sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay
>>>> kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 276042
>>>>=20
>>>> # sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay
>>>> kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 276427
>>>>=20
>>>> # sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay
>>>> kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 277323
>>>>=20
>>>> # sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay
>>>> kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 278027
>=20
> As expected, deletes trigger and wait for TXG commits.
>=20
>>>> I have found a measure of progress: zfs list's USED
>>>> for /usr/local/poudriere/data/.m is decreasing. So
>>>> ztop's d/s was a good classification: deletes.
>>>>=20
>>>>>> #5 0xffffffff829a87d2 at zfs_rmnode+0x72
>>>>>> #6 0xffffffff829b658d at zfs_freebsd_reclaim+0x3d
>>>>>> #7 0xffffffff8113a495 at VOP_RECLAIM_APV+0x35
>>>>>> #8 0xffffffff80c5a7d9 at vgonel+0x3a9
>>>>>> #9 0xffffffff80c5af7f at vrecycle+0x3f
>>>>>> #10 0xffffffff829b643e at zfs_freebsd_inactive+0x4e
>>>>>> #11 0xffffffff80c598cf at vinactivef+0xbf
>>>>>> #12 0xffffffff80c590da at vput_final+0x2aa
>>>>>> #13 0xffffffff80c68886 at kern_funlinkat+0x2f6
>>>>>> #14 0xffffffff80c68588 at sys_unlink+0x28
>>>>>> #15 0xffffffff8106323f at amd64_syscall+0x14f
>>>>>> #16 0xffffffff8103512b at fast_syscall_common+0xf8
>>>>>=20
>>>>> What we don't see here is what quiesce and sync threads of the =
pool are actually doing.  Sync thread has plenty of different jobs, =
including async write, async destroy, scrub and others, that all may =
delay each other.
>>>>>=20
>>>>> Before you rebooted the system, depending how alive it is, could =
you save a number of outputs of `procstat -akk`, or at least =
specifically `procstat -akk | grep txg_thread_enter` if the full is =
hard?  Or somehow else observe what they are doing.
>>>>=20
>>>> # grep txg_thread_enter ~/mmjnk0[0-5].txt
>>>> /usr/home/root/mmjnk00.txt:    6 100881 zfskern             =
txg_thread_enter    mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 =
txg_thread_wait+0xeb txg_quiesce_thread+0x144 fork_exit+0x82 =
fork_trampoline+0xe
>>>> /usr/home/root/mmjnk00.txt:    6 100882 zfskern             =
txg_thread_enter    mi_switch+0x173 sleepq_switch+0x104 =
sleepq_timedwait+0x4b _cv_timedwait_sbt+0x188 zio_wait+0x3c9 =
dsl_pool_sync+0x139 spa_sync+0xc68 txg_sync_thread+0x2eb fork_exit+0x82 =
fork_trampoline+0xe
>>>> /usr/home/root/mmjnk01.txt:    6 100881 zfskern             =
txg_thread_enter    mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 =
txg_thread_wait+0xeb txg_quiesce_thread+0x144 fork_exit+0x82 =
fork_trampoline+0xe
>>>> /usr/home/root/mmjnk01.txt:    6 100882 zfskern             =
txg_thread_enter    mi_switch+0x173 sleepq_switch+0x104 =
sleepq_timedwait+0x4b _cv_timedwait_sbt+0x188 zio_wait+0x3c9 =
dsl_pool_sync+0x139 spa_sync+0xc68 txg_sync_thread+0x2eb fork_exit+0x82 =
fork_trampoline+0xe
>>>> /usr/home/root/mmjnk02.txt:    6 100881 zfskern             =
txg_thread_enter    mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 =
txg_thread_wait+0xeb txg_quiesce_thread+0x144 fork_exit+0x82 =
fork_trampoline+0xe
>>>> /usr/home/root/mmjnk02.txt:    6 100882 zfskern             =
txg_thread_enter    mi_switch+0x173 sleepq_switch+0x104 =
sleepq_timedwait+0x4b _cv_timedwait_sbt+0x188 zio_wait+0x3c9 =
dsl_pool_sync+0x139 spa_sync+0xc68 txg_sync_thread+0x2eb fork_exit+0x82 =
fork_trampoline+0xe
>>>> /usr/home/root/mmjnk03.txt:    6 100881 zfskern             =
txg_thread_enter    mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 =
txg_thread_wait+0xeb txg_quiesce_thread+0x144 fork_exit+0x82 =
fork_trampoline+0xe
>>>> /usr/home/root/mmjnk03.txt:    6 100882 zfskern             =
txg_thread_enter    mi_switch+0x173 sleepq_switch+0x104 =
sleepq_timedwait+0x4b _cv_timedwait_sbt+0x188 zio_wait+0x3c9 =
dsl_pool_sync+0x139 spa_sync+0xc68 txg_sync_thread+0x2eb fork_exit+0x82 =
fork_trampoline+0xe
>>>> /usr/home/root/mmjnk04.txt:    6 100881 zfskern             =
txg_thread_enter    mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 =
txg_thread_wait+0xeb txg_quiesce_thread+0x144 fork_exit+0x82 =
fork_trampoline+0xe
>>>> /usr/home/root/mmjnk04.txt:    6 100882 zfskern             =
txg_thread_enter    mi_switch+0x173 sleepq_switch+0x104 =
sleepq_timedwait+0x4b _cv_timedwait_sbt+0x188 zio_wait+0x3c9 =
dsl_pool_sync+0x139 spa_sync+0xc68 txg_sync_thread+0x2eb fork_exit+0x82 =
fork_trampoline+0xe
>>>> /usr/home/root/mmjnk05.txt:    6 100881 zfskern             =
txg_thread_enter    mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 =
txg_thread_wait+0xeb txg_quiesce_thread+0x144 fork_exit+0x82 =
fork_trampoline+0xe
>>>> /usr/home/root/mmjnk05.txt:    6 100882 zfskern             =
txg_thread_enter    mi_switch+0x173 sleepq_switch+0x104 =
sleepq_timedwait+0x4b _cv_timedwait_sbt+0x188 zio_wait+0x3c9 =
dsl_pool_sync+0x139 spa_sync+0xc68 txg_sync_thread+0x2eb fork_exit+0x82 =
fork_trampoline+0xe
>=20
> So quiesce threads are idle, while sync thread is waiting for TXG =
commit writes completion.  I see no no crime, we should see the same =
just for slow storage.
>=20
>>>>> `zpool status`, `zpool get all` and `sysctl -a` would also not =
harm.
>>>>=20
>>>> It is a very simple zpool configuration: one partition.
>>>> I only use it for bectl BE reasons, not the general
>>>> range of reasons for using zfs. I created the media with
>>>> my normal content, then checkpointed before doing the
>>>> git fetch to start to set up the experiment.
>=20
> OK.  And I see no scrub or async destroy, that could delay sync =
thread. Though I don't see them in the above procstat either.
>=20
>>>> /etc/sysctl.conf does have:
>>>>=20
>>>> vfs.zfs.min_auto_ashift=3D12
>>>> vfs.zfs.per_txg_dirty_frees_percent=3D5
>>>>=20
>>>> The vfs.zfs.per_txg_dirty_frees_percent is from prior
>>>> Mateusz Guzik help, where after testing the change I
>>>> reported:
>>>>=20
>>>> Result summary: Seems to have avoided the sustained periods
>>>> of low load average activity. Much better for the context.
>>>>=20
>>>> But it was for a different machine (aarch64, 8 cores). But
>>>> it was for poudriere bulk use.
>>>>=20
>>>> Turns out the default of 30 was causing sort of like
>>>> what is seen here: I could have presented some of the
>>>> information via the small load average figures here.
>>>>=20
>>>> (Note: 5 is the old default, 30 is newer. Other contexts
>>>> have other problems with 5: no single right setting and
>>>> no automated configuration.)
>=20
> per_txg_dirty_frees_percent is directly related to the delete delays =
we see here.  You are forcing ZFS to commit transactions each 5% of =
dirty ARC limit, which is 5% of 10% or memory size.  I haven't looked on =
that code recently, but I guess setting it too low can make ZFS commit =
transactions too often, increasing write inflation for the underlying =
storage.  I would propose you to restore the default and try again.

While this machine is different, the original problem was worse than
the issue here: the load average was less than 1 for the most part
the parallel bulk build when 30 was used. The fraction of time waiting
was much longer than with 5. If I understand right, both too high and
too low for a type of context can lead to increased elapsed time and
getting it set to a near optimal is a non-obvious exploration.

The only reason I'm not using USE_TMPFS=3Dall in this context is
that the relative lack of zfs I/O well avoids most all of the
recent problems that I was trying to test for.

An overall point for the goal of my activity is: what makes a
good test context for checking if ZFS is again safe to use?
May be other tradeoffs make, say, 4 hardware threads more
reasonable than 32.

A reported observation was that the number of parallel builders being
notably greater than the hardware thread count seemed to make the
deadlocks (and other issues?) more likely and that ALLOW_MAKE_JOBS
seemed to make it less likely. This is how I got into the combination
I was trying.

Also, it may be that testing for repeating the recent problems just
does not fit with use of the USB3 media that I've attempted to use.

For ZFS not having deadlocks and the other issues, I have working
configurations for doing bulk builds, even bulk -a . (I once got
about half of one done on a 8 GiByte RPi4B before a power outage
stopped it. I rarely run bulk -a and do so only as a check on my
normal configurations still being reasonable after FreeBSD progressed
for a notable time, unless it is part of a "ZFS working?" test, like
here.)

>>>> Other than those 2 items, zfs is untuned (defaults).
>>>>=20
>>>> sysctl -a is a lot more output (864930 Bytes) so I'll skip
>>>> it for now.
>>>>=20
>>>>> PS: I may be wrong, but USB in "USB3 NVMe SSD storage" makes me =
shiver. Make sure there is no storage problems, like some huge delays, =
timeouts, etc, that can be seen, for example, as busy percents regularly =
spiking far above 100% in your `gstat -spod`.
>>>>=20
>>>> The "gstat -spod" output showed (and shows): around 0.8ms/w to =
3ms/w,
>>>> mostly at the lower end of the range. < 98%busy, no spikes to > =
100%.
>>>> It is a previously unused Samsung PSSD T7 Touch.
>=20
> Is the ~98% busy most of the time there?

Yes.

> Unfortunately our umass does not support UASP,

I'm aware that FreeBSD has this lack of UASP.

> i.e. supports only one command at a time,

I'd not thought about it before, but this might limit
the utility of testing this way for deadlocks and such
that occur via openzfs.

> so many small I/Os may accumulate more latency than other interface =
storages would.  Higher number of small transaction commits may not help =
it either.

Yea. It may be that the only reasonable test for ZFS would
involve replacing some PCIe slot's or M.2's media content.
Given all the recent problems with ZFS ending up with
data corruptions and deadlocks on main after an openzfs
import, I may need to do that in order to be able to have
a safe test-only context before progressing my normal
contexts to/past an openzfs imports.

>>> A little more context here: that is for the "kB" figures seen
>>> during the cleanup/delete activity. During port builds into
>>> packages larger "kB" figures are seen and the ms/w figures
>>> will tend to be larger as well. The larger sizes can also
>>> lead to reaching somewhat above 100 %busy some of the time.
>>>=20
>>> I'll also note that I've ended up doing a lot more write
>>> activity exploring than I'd expected.
>>>=20
>>>> I was not prepared to replace the content of a PCIe slot's media
>>>> or M.2 connection's media for the temporary purpose. No spare
>>>> supply for those so no simple swapping for those.
>> Trying -J36 (so: 32+4) got to 470 built in about an hour
>> after [02] reached "Builder started". /usr/local/poudriere/data/.m
>> used a little under 40 GiBytes at that point. (I do not have
>> a file count.)
>> The cleanup seems to have gone somewhat faster after my ^C for
>> this context:
>> ^C[01:20:20] Error: Signal SIGINT caught, cleaning up and exiting
>> [01:20:20] [27] [00:02:54] Finished math/p5-Data-Float | =
p5-Data-Float-0.013: Success
>> [main-amd64-bulk_a-default] [2023-09-04_00h30m42s] [sigint:] Queued: =
34588 Built: 502   Failed: 1     Skipped: 50    Ignored: 335   Fetched: =
0     Tobuild: 33700  Time: 01:20:12
>> [01:20:22] Logs: =
/usr/local/poudriere/data/logs/bulk/main-amd64-bulk_a-default/2023-09-04_0=
0h30m42s
>> [01:20:23] [25] [00:04:46] Finished www/p5-HTML-TreeBuilder-XPath | =
p5-HTML-TreeBuilder-XPath-0.14_1: Success
>> [01:20:24] Cleaning up
>> [02:17:01] Unmounting file systems
>> Exiting with status 1
>> So it took about an hour to cleanup after 502 port builds into
>> packages (not published, though).
>> ( gstat -spod showed a fairly general, sustained lack of read =
activity,
>> instead of the comparitively small sustained amount I'd not mentioned
>> for the previous explorations. May be that helped. )
>=20
> I suppose more builders mean deleting more work directories same time?

Yep.

One of the properties of poudriere that is visible via tmpfs storage
size timing, is that the storage for a builder is not released at the
time the builder finishes a build: the storage stays around if the
builder is not used for a time and the adjustment only happens if it
starts a new build (or the overall bulk finishes). This contributes
to USE_TMPFS=3Dall RAM+SWAP requirements when the space for a large
build sticks around for a time after it is done.

> I don't know if it should cause cumulative effect, but I suppose =
should be at least linear.
>=20

Each builder-slot runs its own delete in parallel with the
other slots. So, for more slots than hardware threads, that
adds more waiting as well.

It appears that I've a fundamental choice to make about handling
progressing to/past openzfs imports and USB3 media is probably
excluded from the testing environment if I have one.

Thanks for your help.

=3D=3D=3D
Mark Millard
marklmi at yahoo.com