Re: git: 43741377b143 - main - security/openssl: Security update to 1.1.1n

From: Mark Johnston <markj_at_freebsd.org>
Date: Sat, 19 Mar 2022 19:54:04 UTC
On Sat, Mar 19, 2022 at 12:00:20PM -0700, Mark Millard wrote:
> On 2022-Mar-19, at 11:07, Thomas Zander <riggs@freebsd.org> wrote:
> 
> > On Sat, 19 Mar 2022 at 18:32, Mark Millard <marklmi@yahoo.com> wrote:
> >> May be report to Mark J. how to run the same test builds
> >> that failed for -p8 but worked for -p7?
> > 
> > Sure, good point.
> > A build that reliably causes broken packages on p8 but not on p7 for
> > me is running:
> > 
> > poudriere testport -o multimedia/mplayer -j <13.0-amd64-jail here>
> > 
> > This caused the broken png and python packages when they were built as
> > dependencies.
> > In poudriere.conf I set this:
> > DISTFILES_CACHE=/vcache/distfiles
> > CCACHE_DIR=/vcache/ccache
> > ALLOW_MAKE_JOBS=yes
> > 
> > The ALLOW_MAKE_JOBS should increase the number of parallel IO
> > operations in-flight on the pool, maybe this increases the likelihood
> > of triggering the issue?
> > The DISTFILES_CACHE and CCACHE_DIR are in the same zfs pool as
> > /poudriere, not sure if this is relevant.
> > The zfs pool is a single disk, no raid, mirror or anything fancy.
> 
> On a ThreadRipper 1950X, PCIe Optane storage, 128 GiBytes of
> RAM, I've used bectl to boot the 13.0_RELEASE-p8 environment
> and have started:
> 
> poudriere testport -o multimedia/mplayer -j13_0R-amd64-bulk_a
> 
> where the jail had nothing built in it at the start. So:
> 
> [00:00:08] Building 271 packages using up to 32 builders
> 
> The primary difference is that I've never used ccache and
> did not try to do so here. The "zfs pool is a single disk,
> no raid, mirror or anything fancy" is accurate, as is the
> use of ALLOW_MAKE_JOBS= .
> 
> That did not take long . . .
> 
> It proves that ccache is not required. Also some files
> seem to get only small blocks of zero-bytes, others
> large ones. But I've not checked for the null characters
> being at the end instead of earlier in the file.

I still am not able to reproduce it.  I think it's indeed a concurrency
problem, and I found a possible culprit.  Mark or Thomas, if you're able
to build a new kernel from the releng/13.0 branch and test it, could you
please try this patch?

diff --git a/sys/contrib/openzfs/module/zfs/dnode.c b/sys/contrib/openzfs/module/zfs/dnode.c
index 8592c5f8c3a9..b69ba68ec780 100644
--- a/sys/contrib/openzfs/module/zfs/dnode.c
+++ b/sys/contrib/openzfs/module/zfs/dnode.c
@@ -1661,7 +1661,7 @@ dnode_is_dirty(dnode_t *dn)
 	mutex_enter(&dn->dn_mtx);
 
 	for (int i = 0; i < TXG_SIZE; i++) {
-		if (list_head(&dn->dn_dirty_records[i]) != NULL) {
+		if (multilist_link_active(&dn->dn_dirty_link[i])) {
 			mutex_exit(&dn->dn_mtx);
 			return (B_TRUE);
 		}