From nobody Sat Apr 15 22:49:32 2023 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4PzT644kZVz45DyQ for ; Sat, 15 Apr 2023 22:49:48 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic312-24.consmr.mail.gq1.yahoo.com (sonic312-24.consmr.mail.gq1.yahoo.com [98.137.69.205]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4PzT634VSLz3nCn for ; Sat, 15 Apr 2023 22:49:47 +0000 (UTC) (envelope-from marklmi@yahoo.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=yahoo.com header.s=s2048 header.b=EME8USLL; spf=pass (mx1.freebsd.org: domain of marklmi@yahoo.com designates 98.137.69.205 as permitted sender) smtp.mailfrom=marklmi@yahoo.com; dmarc=pass (policy=reject) header.from=yahoo.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1681598985; bh=QFriFYybrt/DkPHkifqdhSpzObbt2yFMQDV+B0+0Gpk=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From:Subject:Reply-To; b=EME8USLLQlz/XvV78ekYaCznMRKJjPLkW8FW51e6QnVoPwi49dygbbdxUWqWmPSB6qFfJZXYP5zgDG3kcY44VwEBIUYXlnVz5dB6Agw/S/mOXfuD8bJXhoWmdNmivSDwI6aeeYx/Xuf6Bsnr1CqWBqZIRIeQJ4BEXyfmXsws04MzQQWrxlD5pWINULJGOT/0TwzW/G2JFHIHtKXrpaXi0IDpX4+P0oHyWmcwQXl9B6jmwsdn14rnMglgS1GZJbM+lFp1N0kV5mnt8kZJWFLudkT4bzyHYoCDuO8bnRRbEaTbyUE6f1eHttDFYDeOIXT8z52I4Ld4MhTrkM+FdezfYQ== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1681598985; bh=4LDDa5fezD0TEFFGxFNdBW3YyfZO9VxcraiwXPrPHgj=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=f51jeM6wt39jVRA5Ytk5To2KF1iRsFyePNxq04CsP6BtwR8b+l4dqu7/uPJ1jue9Ug1Fiau+yYkPl8ij9qsUxCjgpSSlgMb/49DwrJ7W6DOO7sIbuRJMz41sh7Y24nHcRbLjNMX461fxSg3uSpuY249R5tvltkjCRZ942ZPgec+TgCEeKugrTNJ7hWal6cylWEPt7I4eu+T3Pnq7Dx9ixcD+pprDkQsXM6N0jh4o3Fz0hYtGs0f3QnwKRE8jcYSor+W/Wy83JZoZkl8fOZldbAuHVubTc0KCI0f2P638hAOw4eMvaCegr3mZFDmWiz68YIIlK0QrxtbUgm/lkMqdxQ== X-YMail-OSG: 0K9_Kc4VM1lT3EgQBtL79GXSxYqrm6pvLoT0J7u3pFAIwL9JfbXmmTonepFPUO0 15BslKRImTRoWMqazFCTGBupxaD2eF1R._A5Wdf39zD22PrqzjRbLbQkbJzB_.dtRDiWIYfCQWew 9FnRKkD_VaiwRtt6TdrH6NJg5EDqdGqUUMdobsW56IewcUsbxbKg8_bqeGSpWLSI2rxFRejKRWaI ivTudRpKp2Lwvlx3QCwG.RxHtnjPSxaSxAq4WosZofBqXRDYdOMMTo7k7bSCHKcJo3mt.TxXBkc7 7UzXo4gnVochJ3o2PUmqA0rf3NLD.ZznQ9MN3SiBy7HEzrjnSSxv7QmInWdxBem82LFf873U2oU4 sBStdoi6cbon6GEMUDvVZuw4TGHSNy4Hz9cVowmBmxAwGJwcH0ZOpkkDKYbi5cfQxwQq8U6YlX5E vIupxm5jg9JzOOiD8eiCXnT8BWBc4w2H8T25_AE7JGxiuyZELtnnh7tp4WC9Pb0l5.7OXqlz7Xq8 C55lw9BFUi7HijHVU.9dmCsmJrHICGTte4Q3MjgmuHnoZtnQXCnwy8qd_Jneas8s5BuHH4KDpIAr cEwmQoyaIOL7XRI1_nutoSZvj82h.wigQNtsji2bwWdODgNbOf5ApLtlLFnWmAajEQOHytXXRmGq jmdrxehBE3Q6tDeyZ1HV5doXd_Ln4hgh_8xPl4HZE.RDj0kLLzxwhZ4vrkKLWswg3KOlkRA4Omrj TA7Pfkdk3lJIxTc4_Crk7omPehXbJtoFPKJLqad18ZoPjscYjrY0AAugale9mluLH5AQG3K9rh_Q j9W3GQB.dmv8OhVS4Y1zxjX5vmpXqQAnhfiE199f3eqdc0X0MXEE1DIwpZQ9cFSPqxLhposxiir9 wGFkxBLu9jFiVUYaW0KvV1u51hGrHkoHszYcwydIKQT9OnPt.gcG8dU6CateBGam3ItkyUrbdvGz Bf3scAL_YDRzzLAmZ0vAf5kFutQOfIlGfZiyIa2.S86QLLy8QjNB.QGRoAyywZaOHZBYDg2sVlMF NJD6BKIyLKDQwgh9Keblf1PDm1tM_Te5NemtCe7CBu6AbFB_UfEsKzmeF7v5O1kx.Hjk2e2JeK81 SreiU8agAcuIz8LrMb1pbMzMTREvHxorWs0uN0_px1Qdycvf5U5g.xDL5jnL0MN6SQ8LHmPotCZk zNn2KC7_36jCc0Mqx_0efVRMbweXXlYNZlFAp0jtVb8Gy1YwoCWoCkOntE1f8yy7R8AHTx7Krc1P c5dqVFYjSTP1kanWz1Xhg8MWdjMTmvods7ZmLAC6wjjY.JqsBiIJFps0nSzYmJVJ6M2qOumDfIMa TvxdZz.6syET_wxN6SRR1mSvNDbh1mFkXcahfhzWJcI_N9gc8AxkndL09zsSfl9OQNcSxST1zTxX I8FMmaMXMCHJck9Dv8Sqcn5Mdr50bJIJsakVCITacDgVOA6yXSBizkvHJ0zs6SO54Ezaxeqnshru xmtgaqpv2Bkn.baRz0AfI6rEf8Ty2F00HoNPs5dfJqzWlqfOVWxCkQFAxBtMb2f9vELWc27O5t68 sEvU1hxehcV0wScz0ONbth3z_2rDEBSIqTQ3paIMVaF1JdSYzw7CJX7.JrUFJv1OgdItszfTTPoS 0cznKDdt8n7pI4Bd3hO7BEcaXom9E8mSI60CqEEZemvZTuXMZyPRg8gQWvKAb7Nthhg3fUEIZ5Fb wYzNo87b0gPLfN4mGiq8LYrHDmw5CqihiZ5zUxVlBJiP0yq390_iOjp291PmsMJ3BweUWx1aN6xf QfS5Bge2wPWW.tMPgX3OKj27tyjuCtHMgP0May0AaXDLf9rQ3t02XJXj4xVnWk6Aj91jOGl2WVJq dO0t0Pj1QT92mTPSCPcf.C9ZvUAMTKox8we2qOWkb7ZC0DB9iftLlX08SWmV9qMMIuioNJoDRnXg IPllye2kq6XGsnrJKJ_z9qAP7aM1lt8jfdlE_jgP9Fa94EKCyd7P3pDiHLHsY.XCiMoTnTjJIS1E uwDLP2PHIofl.UNGmZgKc_rrl89yfumwJQvWNjecDFAM26BURvelxLV0tuRaWDe7DcPeEyWmsSq_ mHs99X4khE.0TnH8Itc_o3G_hPAONZWr6Gmwj4zlfhRP4mzNn0hPbHkCA0L8vdU7U14.ViomDUe2 a93C_N3Tx5o6O1whv3tAqtbxbN64rKgzUCaNts8NDQFxHY2KrImk4ovZP6sn7fSsxZvAIpEKJ0s9 migHFz.YZXGgKcSbfDEEDirLQbaNHIUJ4MS8Neah1SQYAFpzYm0P.xKMlVVmvS2_Ik_LTHUVQkPN 3Bn.K_C5k X-Sonic-MF: X-Sonic-ID: 0be95355-c582-4799-9fe4-a1c7a69b5fd6 Received: from sonic.gate.mail.ne1.yahoo.com by sonic312.consmr.mail.gq1.yahoo.com with HTTP; Sat, 15 Apr 2023 22:49:45 +0000 Received: by hermes--production-gq1-546798879c-sfj59 (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID f4b3aba42658f92ddcaaff6db7af9bd6; Sat, 15 Apr 2023 22:49:43 +0000 (UTC) Content-Type: text/plain; charset=us-ascii List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.400.51.1.1\)) Subject: Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75 From: Mark Millard In-Reply-To: <62F8DD62-8E09-4A07-926B-5DE3DB399609@yahoo.com> Date: Sat, 15 Apr 2023 15:49:32 -0700 Cc: FreeBSD User , Cy Schubert , Charlie Li , Pawel Jakub Dawidek , dev-commits-src-main@freebsd.org, Current FreeBSD Content-Transfer-Encoding: quoted-printable Message-Id: <0C286158-8F0D-45D0-8D85-B521278E9518@yahoo.com> References: <20230413071032.18BFF31F@slippy.cwsent.com> <20230413063321.60344b1f@cschubert.com> <20230413135635.6B62F354@slippy.cwsent.com> <319a267e-3f76-3647-954a-02178c260cea@dawidek.net> <441db213-2abb-b37e-e5b3-481ed3e00f96@dawidek.net> <5ce72375-90db-6d30-9f3b-a741c320b1bf@freebsd.org> <99382FF7-765C-455F-A082-C47DB4D5E2C1@yahoo.com> <32cad878-726c-4562-0971-20d5049c28ad@freebsd.org> <20230415115452.08911bb7@thor.intern.walstatt.dynvpn.de> <20230415143625.99388387@slippy.cwsent.com> <20230415175218.777d0a97@thor.intern.walstatt.dynvpn.de> <62F8DD62-8E09-4A07-926B-5DE3DB399609@yahoo.com> To: Mateusz Guzik X-Mailer: Apple Mail (2.3731.400.51.1.1) X-Spamd-Result: default: False [-3.47 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.97)[-0.972]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; MV_CASE(0.50)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MIME_GOOD(-0.10)[text/plain]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_HAS_DN(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_TLS_LAST(0.00)[]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; RCVD_IN_DNSWL_NONE(0.00)[98.137.69.205:from]; RCVD_COUNT_THREE(0.00)[3]; FREEMAIL_FROM(0.00)[yahoo.com]; TO_DN_SOME(0.00)[]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; DKIM_TRACE(0.00)[yahoo.com:+]; FREEMAIL_TO(0.00)[gmail.com]; RCPT_COUNT_SEVEN(0.00)[7]; MIME_TRACE(0.00)[0:+]; FROM_EQ_ENVFROM(0.00)[]; MLMMJ_DEST(0.00)[freebsd-current@freebsd.org] X-Rspamd-Queue-Id: 4PzT634VSLz3nCn X-Spamd-Bar: --- X-ThisMailContainsUnwantedMimeParts: N On Apr 15, 2023, at 15:33, Mark Millard wrote: > On Apr 15, 2023, at 13:30, Mateusz Guzik wrote: >=20 >> On 4/15/23, FreeBSD User wrote: >>> Am Sat, 15 Apr 2023 07:36:25 -0700 >>> Cy Schubert schrieb: >>>=20 >>>> In message = <20230415115452.08911bb7@thor.intern.walstatt.dynvpn.de>, >>>> FreeBSD Us >>>> er writes: >>>>> Am Thu, 13 Apr 2023 22:18:04 -0700 >>>>> Mark Millard schrieb: >>>>>=20 >>>>>> On Apr 13, 2023, at 21:44, Charlie Li = wrote: >>>>>>=20 >>>>>>> Mark Millard wrote: >>>>>>>> FYI: in my original report for a context that has never had >>>>>>>> block_cloning enabled, I reported BOTH missing files and >>>>>>>> file content corruption in the poudriere-devel bulk build >>>>>>>> testing. This predates: >>>>>>>> https://people.freebsd.org/~pjd/patches/brt_revert.patch >>>>>>>> but had the changes from: >>>>>>>> https://github.com/openzfs/zfs/pull/14739/files >>>>>>>> The files were missing from packages installed to be used >>>>>>>> during a port's build. No other types of examples of missing >>>>>>>> files happened. (But only 11 ports failed.) >>>>>>> I also don't have block_cloning enabled. "Missing files" prior = to >>>>>>> brt_rev >>>>> ert may actually >>>>>>> be present, but as the corruption also messes with the file(1) >>>>>>> signature, >>>>> some tools like >>>>>>> ldconfig report them as missing. >>>>>>=20 >>>>>> For reference, the specific messages that were not explicit >>>>>> null-byte complaints were (some shown with a little context): >>>>>>=20 >>>>>>=20 >>>>>> =3D=3D=3D> py39-lxml-4.9.2 depends on shared library: = libxml2.so - not >>>>>> found >>>>>> =3D=3D=3D> Installing existing package = /packages/All/libxml2-2.10.3_1.pkg >>>>>>=20 >>>>>> [CA72_ZFS] Installing libxml2-2.10.3_1... >>>>>> [CA72_ZFS] Extracting libxml2-2.10.3_1: .......... done >>>>>> =3D=3D=3D> py39-lxml-4.9.2 depends on shared library: = libxml2.so - found >>>>>>=20 >>>>>> (/usr/local/lib/libxml2.so) . . . >>>>>> [CA72_ZFS] Extracting libxslt-1.1.37: .......... done >>>>>> =3D=3D=3D> py39-lxml-4.9.2 depends on shared library: = libxslt.so - found >>>>>>=20 >>>>>> (/usr/local/lib/libxslt.so) =3D=3D=3D> Returning to build of >>>>>> py39-lxml-4.9.2 >>>>>> . . . >>>>>> =3D=3D=3D> Configuring for py39-lxml-4.9.2 >>>>>> Building lxml version 4.9.2. >>>>>> Building with Cython 0.29.33. >>>>>> Error: Please make sure the libxml2 and libxslt development = packages >>>>>> are in >>>>> stalled. >>>>>>=20 >>>>>>=20 >>>>>> [CA72_ZFS] Extracting libunistring-1.1: .......... done >>>>>> =3D=3D=3D> libidn2-2.3.4 depends on shared library: = libunistring.so - not >>>>>> found >>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>> [CA72_ZFS] Extracting gmp-6.2.1: .......... done >>>>>> =3D=3D=3D> mpfr-4.2.0,1 depends on shared library: libgmp.so - = not found >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>> =3D=3D=3D> nettle-3.8.1 depends on shared library: libgmp.so - = not found >>>>>> =3D=3D=3D> Installing existing package = /packages/All/gmp-6.2.1.pkg >>>>>> [CA72_ZFS] Installing gmp-6.2.1... >>>>>> the most recent version of gmp-6.2.1 is already installed >>>>>> =3D=3D=3D> nettle-3.8.1 depends on shared library: libgmp.so - = not found >>>>>>=20 >>>>>> *** Error code 1 >>>>>>=20 >>>>>>=20 >>>>>> autom4te: error: need GNU m4 1.4 or later: /usr/local/bin/gm4 >>>>>>=20 >>>>>>=20 >>>>>> checking for GNU >>>>>> M4 that supports accurate traces... configure: error: no = acceptable m4 >>>>>> coul >>>>> d be found in >>>>>> $PATH. GNU M4 1.4.6 or later is required; 1.4.16 or newer is >>>>>> recommended. >>>>>> GNU M4 1.4.15 uses a buggy replacement strstr on some systems. >>>>>> Glibc 2.9 - 2.12 and GNU M4 1.4.11 - 1.4.15 have another strstr = bug. >>>>>>=20 >>>>>>=20 >>>>>> ld: error: /usr/local/lib/libblkid.a: unknown file type >>>>>>=20 >>>>>>=20 >>>>>> =3D=3D=3D >>>>>> Mark Millard >>>>>> marklmi at yahoo.com >>>>>>=20 >>>>>>=20 >>>>>=20 >>>>> Hello >>>>>=20 >>>>> whar is the recent status of fixing/mitigate this desatrous bug? >>>>> Especially f >>>>> or those with the >>>>> new option enabled on ZFS pools. Any advice? >>>>>=20 >>>>> In an act of precausion (or call it panic) I shutdown several = servers to >>>>> prev >>>>> ent irreversible >>>>> damages to databases and data storages. We face on one host with >>>>> /usr/ports r >>>>> esiding on ZFS >>>>> always errors on the same files created while staging (using = portmaster, >>>>> leav >>>>> es the system >>>>> with noninstalled software, i.e. www/apache24 in our case). = Deleting the >>>>> work >>>>> folder doesn't >>>>> seem to change anything, even when starting a scrubbing of the = entire >>>>> pool (R >>>>> AIDZ1 pool) - >>>>> cause unknown, why it affects always the same files to be = corrupted. >>>>> Same wit >>>>> h deve/ruby-gems. >>>>>=20 >>>>> Poudriere has been shutdown for the time being to avoid further = issues. >>>>>=20 >>>>>=20 >>>>> Are there any advies to proceed apart from conserving the boxes = via >>>>> shutdown? >>>>>=20 >>>>> Thank you ;-) >>>>> oh >>>>>=20 >>>>>=20 >>>>>=20 >>>>> -- >>>>> O. Hartmann >>>>=20 >>>> With an up-to-date tree + pjd@'s "Fix data corruption when cloning >>>> embedded >>>> blocks. #14739" patch I didn't have any issues, except for email = messages >>>>=20 >>>> with corruption in my sent directory, nowhere else. I'm still >>>> investigating >>>> the email messages issue. IMO one is generally safe to run = poudriere on >>>> the >>>> latest ZFS with the additional patch. >>>>=20 >>>> My tests of the additional patch concluded that it resolved my last >>>> problems, except for the sent email problem I'm still = investigating. I'm >>>> sure there's a simple explanation for it, i.e. the email thread was >>>> corrupted by the EXDEV regression which cannot be fixed by = anything, even >>>>=20 >>>> reverting to the previous ZFS -- the data in those files will = remain >>>> damaged regardless. >>>>=20 >>>> I cannot speak to the others who have had poudriere and other = issues. I >>>> never had any problems with poudriere on top of the new ZFS. >>>>=20 >>>> WRT reverting block_cloning pools to without, your only option is = to >>>> backup >>>> your pool and recreate it without block_cloning. Then restore your = data. >>>>=20 >>>>=20 >>>=20 >>> All right, I interpret the answer that way, that I need a most = recent source >>> tree (and >>> accordingly built and installed OS) AND a patch that isn't = officially >>> commited? >>>=20 >>> On a box I'm with: >>>=20 >>> FreeBSD 14.0-CURRENT #8 main-n262175-5ee1c90e50ce: Sat Apr 15 = 07:57:16 CEST >>> 2023 amd64 >>>=20 >>> The box is crashing while trying to update ports with the well known = issue: >>>=20 >>> Panic String: VERIFY(!zil_replaying(zilog, tx)) failed >>>=20 >>> At the moment all alarm bells are ringing and I lost track about = what has >>> been patched and >>> already commited and what is still as patch available but in the = phase of >>> evaluation or >>> inofficially emmited here. >>>=20 >>> According to the EXDEV issue: in cases of poudriere or ports trees = on ZFS, >>> what do I have to >>> do to ensure that those datasets are clean? The OS should detect = file >>> corruption but in my >>> case the box is crashing :-( >>>=20 >>> I did several times scrubbing, but this seems to be the action of a = helpless >>> and desperate man >>> ... ;-/ >>>=20 >>> Greetings >>>=20 >>=20 >> Using block cloning is still not safe, but somewhere in this thread >> pjd had a patch to keep it operatinal for already cloned files = without >> adding new ones. >>=20 >> Anyhow, as was indicated by vishwin@ there was data corruption >> *unrelated* to block cloning which also came with the import, I >> narrowed it down: https://github.com/openzfs/zfs/issues/14753 >>=20 >> That said now I'm testing a kernel which does not do block cloning = and >> does not have the other problematic commit, we will see if things >> work. >=20 > (Mostly written as I progressed but some material later > inserted into/around previously written material.) >=20 > Summary: >=20 > As stands, it looks like reverting the dnode_is_dirty > code is what fixes the corruptions that my type of > test context produced via poudriere bulk activity . >=20 >=20 > The details that lead to that summary . . . >=20 > Using my my build environment for updating my temporary, > experimental context, an environment running a world and > and kernel that predate the import: >=20 > # uname -apKU > FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #90 = main-n261544-cee09bda03c8-dirty: Wed Mar 15 20:25:49 PDT 2023 = root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm6= 4.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400082 1400082 >=20 > (Note the "nondbg": I normally run non-debug main builds, > but with symbols not stripped.) >=20 > The kernel and world for this are what is in old-main-CA72: >=20 > # bectl list > BE Active Mountpoint Space Created > main-CA72 R - 3.98G 2023-04-12 20:29 > old-main-CA72 N / 1.08M 2023-02-06 19:44 >=20 > (Most everything else is outside the BE's and so is shared > across the BE's.) >=20 > I updated to also have (whitespace details likely > not preserved in this note): >=20 > # git -C /usr/main-src/ diff = /usr/main-src/sys/contrib/openzfs/module/zfs/dnode.c > diff --git a/sys/contrib/openzfs/module/zfs/dnode.c = b/sys/contrib/openzfs/module/zfs/dnode.c > index 367bfaa80726..49a7f59c0da4 100644 > --- a/sys/contrib/openzfs/module/zfs/dnode.c > +++ b/sys/contrib/openzfs/module/zfs/dnode.c > @@ -1772,17 +1772,7 @@ dnode_is_dirty(dnode_t *dn) > { > mutex_enter(&dn->dn_mtx); > for (int i =3D 0; i < TXG_SIZE; i++) { > - list_t *list =3D &dn->dn_dirty_records[i]; > - for (dbuf_dirty_record_t *dr =3D list_head(list); > - dr !=3D NULL; dr =3D list_next(list, dr)) { > - if (dr->dr_dbuf =3D=3D NULL || > - (dr->dr_dbuf->db_blkid !=3D = DMU_BONUS_BLKID && > - dr->dr_dbuf->db_blkid !=3D = DMU_SPILL_BLKID)) { > - mutex_exit(&dn->dn_mtx); > - return (B_TRUE); > - } > - } > - if (dn->dn_free_ranges[i] !=3D NULL) { > + if (multilist_link_active(&dn->dn_dirty_link[i])) { > mutex_exit(&dn->dn_mtx); > return (B_TRUE); > } >=20 >=20 >=20 >=20 > I did my usual buildworld buildkernel sequence and then > one of my normal install sequences into main-CA72 to > update it to have the change, as well as the prior > material involved in my first experiment that I'd > reported on. >=20 > I cleared the content of the jail that I use for > temporary experiments, such as the prior testing that > got the 11 builder failures: >=20 > # poudriere pkgclean -jmain-CA72-bulk_a -A >=20 > I then rebooted using the updated main-CA72 BE. >=20 > Then I started the: >=20 > # poudriere bulk -jmain-CA72-bulk_a -w -f ~/origins/CA72-origins.txt > . . . > [00:00:37] Building 476 packages using up to 16 builders > [00:00:37] Hit CTRL+t at any time to see build progress and stats > [00:00:38] [01] [00:00:00] Builder starting > [00:00:40] [01] [00:00:02] Builder started > [00:00:40] [01] [00:00:00] Building ports-mgmt/pkg | pkg-1.19.1_1 >=20 > In the prior experiment it got: >=20 > 476 =3D 252 success + 11 failed + 213 skipped >=20 > and it reported the time for that as: 00:37:52. >=20 > A normal from-scratch build takes many hours (multiple > compiler toolchains and such) so my first report after > this point will be for one of: >=20 > A) It got to, say, 00:40:00 or beyond with, or without > failures. > vs. > B) It got failures and stopped before that. >=20 > . . . TIME GOES BY . . . >=20 > At about 00:40:00 the status was: >=20 > [00:40:00] [06] [00:00:00] Building x11/libXv | libXv-1.0.12,1 > load: 30.73 cmd: sh 1508 [nanslp] 2400.88r 6.69u 11.90s 0% 3960k > [main-CA72-bulk_a-default] [2023-04-15_14h47m19s] [parallel_build:] = Queued: 476 Built: 235 Failed: 0 Skipped: 0 Ignored: 0 Fetched: 0 = Tobuild: 241 Time: 00:40:01 > ID TOTAL ORIGIN PKGNAME = PHASE PHASE TMPFS CPU% MEM% > [15] 00:07:44 devel/py-lxml@py39 | py39-lxml-4.9.2 = stage 00:00:08 40.00 KiB 0% 0% > [01] 00:00:34 x11/libXxf86vm | libXxf86vm-1.1.4_3 = build-depends 00:00:03 56.00 KiB 2.3% 0% > [16] 00:01:59 x11-toolkits/libXt | libXt-1.2.1,1 = configure 00:00:52 40.00 KiB 0.3% 0% > [02] 00:01:40 devel/dbus | dbus-1.14.6,1 = configure 00:00:05 36.00 KiB 0.5% 0% > [03] 00:02:20 x11/libXrender | libXrender-0.9.10_2 = configure 00:01:27 40.00 KiB 0% 0% > [04] 00:00:44 graphics/libglvnd | libglvnd-1.6.0 = build-depends 00:00:13 52.00 KiB 20.3% 0.1% > [05] 00:01:39 x11/xprop | xprop-1.2.6 = configure 00:00:45 56.00 KiB 0.7% 0.2% > [06] 00:00:14 x11/libXv | libXv-1.0.12,1 = pkg-depends 00:00:03 52.00 KiB 3.6% 0% > [07] 00:01:57 x11/libXfixes | libXfixes-6.0.0 = configure 00:00:42 40.00 KiB 0.1% 0% > [08] 00:03:01 devel/glib20 | glib-2.76.1,2 = configure 00:01:26 40.00 KiB 4.3% 0.1% > [09] 00:01:21 shells/bash-completion | bash-completion-2.11_2,2 = configure 00:00:13 32.00 KiB 5.7% 0% > [10] 00:06:26 devel/qt5-buildtools | qt5-buildtools-5.15.8p157 = package 00:01:57 44.00 KiB 76.1% 0.1% > [11] 00:01:20 print/psutils | psutils-1.17_5 = stage 00:00:03 40.00 KiB 0.2% 0% > [12] 00:02:09 x11/libxkbfile | libxkbfile-1.1.0 = configure 00:01:22 44.00 KiB 0.1% 0% > [13] 00:08:54 devel/cmake-core | cmake-core-3.25.1 = build 00:01:43 36.00 KiB 694.9% 5.2% > [14] 00:01:20 x11/xcb-util-image | xcb-util-image-0.4.0_1 = configure 00:00:14 48.00 KiB 0% 0% > [00:40:14] Logs: = /usr/local/poudriere/data/logs/bulk/main-CA72-bulk_a-default/2023-04-15_14= h47m19s > [00:40:22] [11] [00:01:28] Finished print/psutils | psutils-1.17_5: = Success >=20 > So no failures so far, 235 ports built and a bunch in process. >=20 > Note: maximum observed ("MaxObs") load averages so far > as shown below (for there being only 16 cores, i.e., > 16 Cortex-A72 threads, one per core): >=20 > load averages: 43.75, 34.32, 27.40 MaxObs: 45.03, 34.32, 27.40 >=20 > I'll report again if it gets a corruption failure. > Otherwise it will be many hours before it would > complete without such failures (multiple compiler > toolchain builds and such). >=20 To be explicit, since I've now seen your commit: My test does not include the change (whitespace details not preserved in this note) @@ -2650,9 +2641,7 @@ dnode_next_offset(dnode_t *dn, int flags, uint64_t = *offset, rw_enter(&dn->dn_struct_rwlock, RW_READER); if (dn->dn_phys->dn_nlevels =3D=3D 0) { - if (!(flags & DNODE_FIND_HOLE)) { - error =3D SET_ERROR(ESRCH); - } + error =3D SET_ERROR(ESRCH); goto out; } I make no claims about which way dnode_next_offset should be. I was only providing some independent evidence that might prove of some use to folks that understand the alternative code sequences. For reference, at about the 1 hr point for what I'm testing: [00:57:49] [01] [00:00:51] Finished sysutils/u-boot-tools | = u-boot-tools-2022.10: Success [main-CA72-bulk_a-default] [2023-04-15_14h47m19s] [parallel_build:] = Queued: 476 Built: 306 Failed: 0 Skipped: 0 Ignored: 0 Fetched: 0 = Tobuild: 170 Time: 00:59:49 ID TOTAL ORIGIN PKGNAME PHASE PHASE = TMPFS CPU% MEM% [06] 00:16:54 devel/binutils@native | binutils-2.40_2,1 package 00:04:22 = 56.00 KiB 100% 0.2% [09] 00:15:50 lang/ruby31 | ruby-3.1.3_2,1 package 00:00:28 = 40.00 KiB 100% 0.2% [13] 00:28:43 devel/cmake-core | cmake-core-3.25.1 package 00:14:20 = 36.00 KiB 100% 0.2% [01:00:03] Logs: = /usr/local/poudriere/data/logs/bulk/main-CA72-bulk_a-default/2023-04-15_14= h47m19s [01:00:06] [06] [00:16:57] Finished devel/binutils@native | = binutils-2.40_2,1: Success =3D=3D=3D Mark Millard marklmi at yahoo.com