From nobody Mon Sep 04 07:13:51 2023 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RfKd10nnXz4rslP for ; Mon, 4 Sep 2023 07:14:13 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic301-20.consmr.mail.gq1.yahoo.com (sonic301-20.consmr.mail.gq1.yahoo.com [98.137.64.146]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4RfKcz61hCz3WZS for ; Mon, 4 Sep 2023 07:14:11 +0000 (UTC) (envelope-from marklmi@yahoo.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=yahoo.com header.s=s2048 header.b=DiZi4vA3; spf=pass (mx1.freebsd.org: domain of marklmi@yahoo.com designates 98.137.64.146 as permitted sender) smtp.mailfrom=marklmi@yahoo.com; dmarc=pass (policy=reject) header.from=yahoo.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1693811649; bh=rf4e2rs8lNvnhpUs3YXXW+YGU1zniG0RhEOgLu8cXiE=; h=From:Subject:Date:References:To:In-Reply-To:From:Subject:Reply-To; b=DiZi4vA3Mji52ejuvuAek/9RGC2GfyfPNx64pc9n9urYWev3Y2wzRqwosvvg1PxjDP4XFfpvwh9oJXnbX8gB6ljNg5CLhS0rSLbzpFyXOmiN2ccIL0hKkIfQZ4p/OYZoU+t7bnQvPpd5pGaIMfyqHMk9OKKpVDs19oYxdGGvpvr+X8FZLMb3b4vQ/FlADB31i1JgaFSPfUKbPXLISHBeQ64EgR6nQKHsm12dEIxEf/5VO2dq+cYyKDS++fM3Oo5QENL6q81r7f62EgIpPBF3qJkwh2lpow6ywYMPuTnnZ/OFjhzmSKcwqr2aNv4W/jRbKlVzaMkIJM3+88Jx6rc0qA== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1693811649; bh=l2T8Wox99ShM5TpMbDYO+Gfg3mmSF49bPvryTsOYvig=; h=X-Sonic-MF:From:Subject:Date:To:From:Subject; b=hbWL/5wZllzQCAKhyl/SXYsPNuQAnwD46BGt80JcDwmwRakGsfr5JYdKotg63VY8TGMqSCkMlr9IISidrL8+8zQpCqN0SNM93xFWMwk9kjScq9bOlSy1VWNfV/FBcxk0HtjDAxTsdE6+FoKLBc6Nmr/ro/PzhpvdgqLM2KWva3iaHpgbJD6pn7RBWh24O7WyYHnFw3Pv2JoYGjqmzt/jycTs0tiWTG02egaSPFnCBEj6bybIs54Mg2LyJlOYmp3dTJ57ShLxHdYmiWV8L8ewBmP9mmI+q6xgbRs1/UdUN07Q9cNCkU8r8AXh6/uKdW1J9Jc9Vc9Nj9oHpgKs8pNxLw== X-YMail-OSG: t9JWtlkVM1kKrzmoUONXvk_vr3QaNXZXapLpEzCgmJ08Eli5OBmomhrPpN.qIHG Ua9KPrNMXkYQY35KpC383dmchffXt6gUnyvAAyDXBLqEiTh2FTD_Ja.ZzqDF13sbH1vLF5ywaTG. XFPo4c4P8gfQ715OZp14s1Ht0WVqPGhZxK_NDZN6C_hmtpxX49jdSS_zsiD6o.D9aeetJs_feQmM A3UEzCbZgyqAbsp0pf9eaHboRIjW2cf8ULhHrX3k4Ir6fbSOitkp0Fdw4RbCRIzr21bcUZHgFwTk r8YuPOY7659pFfdktFghCu9158I44ob9h7T1MZN5XUIAJiudJoc_NEB5tmG5wrpyyzZ.QmgoOWgE LGk30iwYFXZD8_rYRUuQqNX3I.A8MF9RAXwq4ruScHiwXgLlTujPeEfL.SOoyMhRPbQWfYo8QhU4 kz4JQV6EHxDzY1gC5Z6VIEBc8cw0D9p2jef7JeyLUnRFVvyPur.wGSGBUGXbNWAb0kEEbvUy.d3y 1V82fNHV12868Oh7JxpUvaDregz0E8sfXHOwDGAIyMG8EX6HKfLshX8GWx_Ls2Sf5ZJH4i95VbPP 6qGFX3kXKnMb7cr4xfx_xmzhaHtYf10nXISkrh4eHvirg9tX36N1Pghlb6QgpWB6AH1IdlQy8Ahb IuxT0H3hrWtugpDC3hbJpAdAXich3UohW7GOQE4O5SqjVRnGd0_uitGE9CIsAyUDg2BUhoJoH_ER FXjaWw7ZoMwLiEpGqoGU2SSw4w_5R0qDvR8iTxwXzZ4J6LNuom.2uhRkC948u6hqcIFM3r5yCsUj KLktqg9XDWiBW3_sm_DJyAKKsPdCiglCEkvq.09VyQoz4VdXyAkbwqOKsWLJLm3F8IDA6I1TvaWO ByrDjqqyy4gxQcp1AWmlaqbiNtfgpZab5LRgxPwPHV1V775LEF29RVAUUJYl7CwOkEp3w9TrYHy3 .QgKHFMQP2PvoXcYDuM1C9eHCE0vvasE2LLGhXgzrUEgIsh9n7334ES79yx2aLwCqQk_KH5lg3SC 9wPlStNSHV109z59accdoYw7DYqBJurjWulN91_IdDk1aUxchw86yAMv.9npT3Glex0T27xQEaeJ bgZ0xyAVLa81YhOrNJIOk79j9ny_FjYC8oubTYc8ToOHW4PJkkt29Qr55wIrNSCBP27wiRpJygnF fdj.od6qKEkzB8pbzwXS1N_OM.9MYPbDgVl3dzjvOeyqsxXL_Umm73RyzuKg._sqWX80H7_SvXZ7 0T98sT9xQZ_GSIVFXUfZqyM6EoVCN3G84UZ4gnlsyuB85BytWE_c3U0LwA.y0avaIv0FRgj4nMde 1g8RkU5fbgC29NIxsjV24_G1HdXQVtGjTTb6pq4fXhaj_qjf4LN28cPv5vdqhTPfwbssAiaYY6Sy RUuNzqmqV7n7oqljrPCsjWW09w3Vj9dSkryFh8fGNiqsKahtQHEAmmd4OwQAk1gGiEonRZ9Aak.S 7L4RhpuCARqr2nynlorn0xIQ6twJ9s6jZx3.nbCULu1ZelwUvIpsgwWT9L2P4ir2MZ5byU5uJKd2 WsH4XZu0VjPiRR.Edvj.XKsYDyExovh1_UVAvOq59OAaWsuXkFhW02zAkXg9W8oCMoCQe3Ur56lJ 5YGmEqEJF3fm8QkaqaUAC.K.vyUc0f6kdGWwIsHSRyoIJlVC35F81vkRbdZ2pA2bwsE5FTI2nAzA wBWuoqgOZluWPVSWJn9_mpJSwJI8wtelX6i.hGP7X97GBIy.O.R1Mn9SmSBtLFzXZ.aslnZWxsIt jjy_X2v3ERCLoIiGsMn_K1EKZar1fljxGQXROipqJg.uDZ3ivtafkMP34Z1geJH8E9smgTGYjxxB JC6z8lKhiFeClxh1YVmOt3Eqrbfkldn1xLul0GtvquADeArUE6zY.CSv3EUOlcsoycTkskphx9vP XKfuMFqKN0egMQP5BIyYNHKjuqNXcnveLg8dHMlw3z4NuLiUK3GULFR1wm2wwY_uNnEu2wH4nGGP UygJmKorm.1gbrd2BXLM.TCBvUa2rRKgzhF7s5pq5Ib8Cru_33Je04CqVnjLzp.JhQhTFdrLQT6A cOX1M2ASsXJPACRCJWScF9mqgHlWk880d0vis0tEG.t9kkCAgkw4d_aXRPgINZLI8SvDKgFevbhF 1X_1.1KU_Ckk8133ew9mlJ0S9xIx7M1nBJqhdfE_pghUz_DDjl.RXE1sNeO1V9XYMPo5yjUIUqtx awUenwFVwcqDYWi1ME2mBVxXM4UkXr2uKLhjXs84OqEZ_sXawEXMvU_rvgG.mD3dBHKtLorCq88c Zvw-- X-Sonic-MF: X-Sonic-ID: 349e7a22-7c61-4f01-8be8-89f64fd0ff2b Received: from sonic.gate.mail.ne1.yahoo.com by sonic301.consmr.mail.gq1.yahoo.com with HTTP; Mon, 4 Sep 2023 07:14:09 +0000 Received: by hermes--production-bf1-865889d799-cgv22 (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID 55f82fecaa0818bcdc051b884e959c45; Mon, 04 Sep 2023 07:14:03 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.700.6\)) Subject: Re: An attempted test of main's "git: 2ad756a6bbb3" "merge openzfs/zfs@95f71c019" that did not go as planned Date: Mon, 4 Sep 2023 00:13:51 -0700 References: To: dev-commits-src-main@freebsd.org, Current FreeBSD In-Reply-To: Message-Id: <8346B473-C75E-44DC-8EB9-7F71E37828E0@yahoo.com> X-Mailer: Apple Mail (2.3731.700.6) X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.50 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; MV_CASE(0.50)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MIME_GOOD(-0.10)[text/plain]; FROM_HAS_DN(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MLMMJ_DEST(0.00)[freebsd-current@freebsd.org]; ARC_NA(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[98.137.64.146:from]; BLOCKLISTDE_FAIL(0.00)[98.137.64.146:server fail]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; RWL_MAILSPIKE_POSSIBLE(0.00)[98.137.64.146:from]; DKIM_TRACE(0.00)[yahoo.com:+]; FREEMAIL_FROM(0.00)[yahoo.com]; TO_DN_SOME(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; RCVD_TLS_LAST(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2] X-Rspamd-Queue-Id: 4RfKcz61hCz3WZS On Sep 3, 2023, at 19:54, Mark Millard wrote: > ThreadRipper 1950X (32 hardware threads) doing bulk -J128 > with USE_TMPFS=3Dno , no ALLOW_MAKE_JOBS , no > ALLOW_MAKE_JOBS_PACKAGES , USB3 NVMe SSD storage/ZFS-boot-media, > debug system build in use : >=20 > [00:03:44] Building 34214 packages using up to 128 builders > [00:03:44] Hit CTRL+t at any time to see build progress and stats > [00:03:44] [01] [00:00:00] Builder starting > [00:04:37] [01] [00:00:53] Builder started > [00:04:37] [01] [00:00:00] Building ports-mgmt/pkg | pkg-1.20.6 > [00:05:53] [01] [00:01:16] Finished ports-mgmt/pkg | pkg-1.20.6: = Success > [00:06:15] [01] [00:00:00] Building print/indexinfo | indexinfo-0.3.1 > [00:06:15] [02] [00:00:00] Builder starting > . . . > [00:06:18] [128] [00:00:00] Builder starting > [00:07:42] [01] [00:01:27] Finished print/indexinfo | indexinfo-0.3.1: = Success > [00:07:45] [01] [00:00:00] Building devel/gettext-runtime | = gettext-runtime-0.22_1 > [00:18:45] [01] [00:11:00] Finished devel/gettext-runtime | = gettext-runtime-0.22_1: Success > [00:19:06] [01] [00:00:00] Building devel/gmake | gmake-4.3_2 > [00:24:13] [01] [00:05:07] Finished devel/gmake | gmake-4.3_2: Success > [00:24:39] [01] [00:00:00] Building devel/libtextstyle | = libtextstyle-0.22 > [00:31:08] [125] [00:24:50] Builder started > [00:31:08] [125] [00:00:00] Building print/t1utils | t1utils-1.32 > [00:31:15] [33] [00:25:00] Builder started > [00:31:15] [81] [00:24:59] Builder started > [00:31:15] [33] [00:00:00] Building databases/xapian-core | = xapian-core-1.4.23,1 > [00:31:15] [13] [00:25:00] Builder started > [00:31:15] [81] [00:00:00] Building devel/bmake | bmake-20230723 > [00:31:15] [13] [00:00:00] Building devel/evdev-proto | = evdev-proto-5.8 > [00:31:16] [41] [00:25:00] Builder started > [00:31:16] [41] [00:00:00] Building devel/pcre | pcre-8.45_3 > . . . >=20 > (Looks like lang/go120 ignores the lack of ALLOW_MAKE_JOBS . > There may be others that still have signficant parallel > activity.) >=20 > [main-amd64-bulk_a-default] [2023-09-03_13h48m45s] [parallel_build:] = Queued: 34588 Built: 727 Failed: 1 Skipped: 40 Ignored: 335 = Fetched: 0 Tobuild: 33485 Time: 01:36:51 >=20 > (So about 1 hr after the last "Builder starting" it had > built 727.) >=20 > The vast majority of the time: lots of cpdup's with tx->tx > showing most of the time for STATE but showing having some > CPU time. >=20 > ^T commonly showed various Builders in starting PHASE for > 3min..6min. >=20 > Around 66% mean Idle time (guess from watching top). >=20 > After ^C "gstat -spod" reports it is almost always writing > 2200 to 2500 writes per second or so for *hours* (still > going on). >=20 > ztop reports 1500 to 3200 d/s or so almost always for > Dataset zamd64/poudriere/data/.m instead (also still going > on). Normally no other Dataset is shown. >=20 > With all the disk I/O activity, this is definitely "live" > in some sense. But I've no clue if it is just repeating > itself over and over vs. if it making some sort of progress. >=20 > For reference for the ^C and after: >=20 > ^C[01:39:00] [20] [00:00:03] Building sysutils/linux-c7-dosfstools | = linux-c7-dosfstools-3.0.20 > [01:39:00] [93] [00:07:12] Finished science/dimod | dimod-0.12.11: = Success > [01:39:00] Error: Signal SIGINT caught, cleaning up and exiting > [01:39:02] [63] [00:06:34] Finished archivers/unarj | unarj-2.65_2: = Success > [01:39:03] [128] [00:07:47] Finished sysutils/shuf | shuf-3.0: Success > [01:39:04] [113] [00:07:06] Finished devel/bsddialog | = bsddialog-0.4.1: Success > [main-amd64-bulk_a-default] [2023-09-03_13h48m45s] [sigint:] Queued: = 34588 Built: 752 Failed: 1 Skipped: 40 Ignored: 335 Fetched: = 0 Tobuild: 33460 Time: 01:38:56 > [01:39:06] Logs: = /usr/local/poudriere/data/logs/bulk/main-amd64-bulk_a-default/2023-09-03_1= 3h48m45s > [01:39:14] [12] [00:09:07] Finished archivers/rzip | rzip-2.1_1: = Success > [01:39:14] Cleaning up > exit: cannot open ./var/run/49_nohang.pid: No such file or directory > exit: cannot open ./var/run/87_nohang.pid: No such file or directory >=20 > After that ^t produced the likes of: >=20 > load: 6.39 cmd: sh 4849 [tx->tx_quiesce_done_cv] 10047.33r 0.51u = 121.32s 1% 13004k > #0 0xffffffff80b6f103 at mi_switch+0x173 > #1 0xffffffff80bc0f24 at sleepq_switch+0x104 > #2 0xffffffff80aec4c5 at _cv_wait+0x165 > #3 0xffffffff82aba365 at txg_wait_open+0xf5 > #4 0xffffffff82a11b81 at dmu_free_long_range+0x151 > #5 0xffffffff829a87d2 at zfs_rmnode+0x72 > #6 0xffffffff829b658d at zfs_freebsd_reclaim+0x3d > #7 0xffffffff8113a495 at VOP_RECLAIM_APV+0x35 > #8 0xffffffff80c5a7d9 at vgonel+0x3a9 > #9 0xffffffff80c5af7f at vrecycle+0x3f > #10 0xffffffff829b643e at zfs_freebsd_inactive+0x4e > #11 0xffffffff80c598cf at vinactivef+0xbf > #12 0xffffffff80c590da at vput_final+0x2aa > #13 0xffffffff80c68886 at kern_funlinkat+0x2f6 > #14 0xffffffff80c68588 at sys_unlink+0x28 > #15 0xffffffff8106323f at amd64_syscall+0x14f > #16 0xffffffff8103512b at fast_syscall_common+0xf8 >=20 > The console/logs do report "witness exhausted": >=20 > . . . > Sep 3 13:41:08 amd64-ZFS login[1751]: ROOT LOGIN (root) ON ttyv0 > Sep 3 13:51:35 amd64-ZFS kernel: witness_lock_list_get: witness = exhausted > Sep 3 14:26:38 amd64-ZFS kernel: pid 27418 (conftest), jid 245, uid = 0: exited on signal 11 (core dumped) > . . . >=20 > So it did not take long for the "witness exhausted" to > happen. >=20 > # uname -apKU > FreeBSD amd64-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT amd64 1500000 #74 = main-n265143-525bc87f54f2-dirty: Sun Sep 3 13:35:04 PDT 2023 = root@amd64_ZFS:/usr/obj/BUILDs/main-amd64-dbg-clang/usr/main-src/amd64.amd= 64/sys/GENERIC-DBG amd64 amd64 1500000 1500000 >=20 >=20 > Looks like I'll be forcing the machine to reboot or > to power off. The media was deliberately set up for > doing risky tests. It is not my normal environment. >=20 Using -J64 instead of -J128. It does avoid "witness exhausted" for at least the 1st hour. [00:03:51] Building 34214 packages using up to 64 builders [00:03:51] Hit CTRL+t at any time to see build progress and stats [00:03:51] [01] [00:00:00] Builder starting [00:04:49] [01] [00:00:58] Builder started [00:04:49] [01] [00:00:00] Building ports-mgmt/pkg | pkg-1.20.6 [00:06:07] [01] [00:01:18] Finished ports-mgmt/pkg | pkg-1.20.6: Success [00:06:31] [01] [00:00:00] Building print/indexinfo | indexinfo-0.3.1 [00:06:31] [02] [00:00:00] Builder starting . . . [00:06:33] [64] [00:00:00] Builder starting [00:09:06] [01] [00:02:35] Finished print/indexinfo | indexinfo-0.3.1: = Success [00:09:08] [01] [00:00:00] Building devel/gettext-runtime | = gettext-runtime-0.22_1 [00:21:49] [16] [00:15:18] Builder started [00:21:49] [16] [00:00:00] Building devel/libdaemon | libdaemon-0.14_1 [00:21:49] [29] [00:15:17] Builder started [00:21:49] [20] [00:15:18] Builder started [00:21:49] [41] [00:15:17] Builder started [00:21:49] [29] [00:00:00] Building textproc/libunibreak | = libunibreak-5.1,1 [00:21:49] [20] [00:00:00] Building graphics/poppler-data | = poppler-data-0.4.12 [00:21:49] [35] [00:15:17] Builder started [00:21:49] [41] [00:00:00] Building archivers/libmspack | = libmspack-0.11alpha . . . [main-amd64-bulk_a-default] [2023-09-03_20h48m38s] [parallel_build:] = Queued: 34588 Built: 438 Failed: 1 Skipped: 50 Ignored: 335 = Fetched: 0 Tobuild: 33764 Time: 01:21:30 . . . ^C[01:21:57] [32] [00:07:04] Finished devel/p5-Test-Deep | = p5-Test-Deep-1.204: Success [01:21:57] Error: Signal SIGINT caught, cleaning up and exiting [01:22:03] [39] [00:06:01] Finished textproc/p5-Lingua-Stem-Ru | = p5-Lingua-Stem-Ru-0.04: Success [01:22:04] [35] [00:06:09] Finished devel/p5-ExtUtils-InstallPaths | = p5-ExtUtils-InstallPaths-0.012: Success [main-amd64-bulk_a-default] [2023-09-03_20h48m38s] [sigint:] Queued: = 34588 Built: 442 Failed: 1 Skipped: 50 Ignored: 335 Fetched: = 0 Tobuild: 33760 Time: 01:21:50 [01:22:04] Logs: = /usr/local/poudriere/data/logs/bulk/main-amd64-bulk_a-default/2023-09-03_2= 0h48m38s [01:22:11] Cleaning up (So only around 438 built in similar time frame relationship used for the -J128 test that got to more like 752. But -J128 had the "witness exhausted" status as well.) Turns out a measure of progress is the USED listed by zfs list for /usr/local/poudriere/data/.m/ . It spends lots of time waiting in various processes during its deletion activity. Previously I'd been told to use vfs.zfs.per_txg_dirty_frees_percent=3D5 instead of the default (30) in order to avoid ending up with sustained very small load averages in my poudiere bulk runs for my kind of context. (5 is the older default, as it turns out.) This may be somewhat of a deletion/cleanup stage variant of that sort of issue. It may be that trying a factor of 2+ for 32 hardware threads just does not scale the same as a factor of 2 did for folks using 4 hardware thread machines where testing for the deadlock issue. May be -J36 (so: 32+4) would be more reasonable for deadlock testing for this context, possibly avoiding running into these other issues so strongly. =3D=3D=3D Mark Millard marklmi at yahoo.com