From nobody Thu Apr 13 10:54:42 2023 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4PxxLW3BX1z44lhK; Thu, 13 Apr 2023 10:55:15 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.dawidek.net (mail.dawidek.net [94.130.64.56]) by mx1.freebsd.org (Postfix) with ESMTP id 4PxxLV627pz3By4; Thu, 13 Apr 2023 10:55:14 +0000 (UTC) (envelope-from pawel@dawidek.net) Authentication-Results: mx1.freebsd.org; none Received: from smtpclient.apple (unknown [111.65.32.33]) by mail.dawidek.net (Postfix) with ESMTPSA id 42B8C4F2F6; Thu, 13 Apr 2023 12:54:57 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=dawidek.net; s=202110; t=1681383308; bh=kBy3BPb8wp7B66sAzl3SOsqwpwg1ehP7blWgR4opfZw=; h=Subject:From:In-Reply-To:Date:Cc:References:To; b=OUXi75TW/mT0f7qYie0IXzqKEw2LAuz45poEypn66PE2fDx3QGQ3MYC6nXBV+CU+i bYnz2jNIEUEcnulZcEjDUFZMjQSSfUZUn7rxdoNh9z6VjAl7jGlHrM7QN2OLpiOzVK +Qp5WvqXAh0tvlUBoNdYparJvOezi+czbRFfSnws= Content-Type: multipart/alternative; boundary=Apple-Mail-2BA72446-819E-425D-8704-55E95D118510 Content-Transfer-Encoding: 7bit List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org Mime-Version: 1.0 (1.0) Subject: Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75 From: =?utf-8?Q?Pawe=C5=82_Jakub_Dawidek?= In-Reply-To: <20230413071032.18BFF31F@slippy.cwsent.com> Date: Thu, 13 Apr 2023 19:54:42 +0900 Cc: Mark Millard , Mateusz Guzik , vishwin@freebsd.org, dev-commits-src-main@freebsd.org, Current FreeBSD , pjd@freebsd.org Message-Id: References: <20230413071032.18BFF31F@slippy.cwsent.com> To: Cy Schubert X-Mailer: iPhone Mail (20E252) X-Rspamd-Queue-Id: 4PxxLV627pz3By4 X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:24940, ipnet:94.130.0.0/16, country:DE] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N --Apple-Mail-2BA72446-819E-425D-8704-55E95D118510 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable
On Apr 13, 2023, at 16:10, Cy Schubert <Cy.Schubert@cschubert.com>= wrote:

=EF=BB=BFIn message <20230413070426.8A54F25A@slippy.cwsent.com>= , Cy Schubert writes:
In message &= lt;20230413064252.1E5C1318@slippy.cwsent.com>, Cy Schubert writes:=
I= n message <A291C24C-9D7C-4E79-AD03-68ED910FC2DE@yahoo.com>, Mark Milla= rd
write
s:
[This just puts my prior reply's material into Cy's
adjusted resend of the orig= inal. The To/Cc should
be coomplete this time.]

On Apr 1= 2, 2023, at 22:52, Cy Schubert <Cy.Schubert@cschubert.com> =3D<= br>
wrote:

<= /blockquote>
In message <C8E4A43B-9FC8-45= 6E-ADB3-13E7F40B2B04@yahoo.com>, Mark =3D
Millard=3D20
<= /blockquote>
write
<= /blockquote>
s:
=
From: Charlie Li &= lt;vishwin_at_freebsd.org> wrote on
<= /blockquote>
Date: Wed, 12 Apr 2023 20:11:16 UTC :
=3D20
Charlie Li wrote:
Mateusz Guzik wrote:
<= /blockquote>
= can you please test poudriere with
<= blockquote type=3D"cite">
https://github.co= m/openzfs/zfs/pull/14739/files
=3D20
<= blockquote type=3D"cite">
After applying, o= n the md(4)-backed pool regardless of =3D3D
block_cloning,=3D3D20
=
the cy@ `cp -R` test reports no differing (= ie corrupted) files. =3D
=
Will=3D3D20=3D3D
=
=3D20
rep= ort back on poudriere results (no block_cloning).
<= blockquote type=3D"cite">
=3D3D20
As for poudriere, build failures are still rolling in. These are =3D
(and=3D3D20=3D3D
=3D20<= br>
have been) entirely random on every run. Some examples from this =3D
run:
=3D3D20
<= blockquote type=3D"cite">lang/php81:
- post-inst= all: @${INSTALL_DATA} ${WRKSRC}/php.ini-development=3D3D20
<= span>${WRKSRC}/php.ini-production ${WRKDIR}/php.conf =3D3D
${STAGEDIR}/${PREFIX}= /etc
- consumers fail to build due to corrupted php.conf packaged
=3D3D20
=
devel/ninja:
=
- phase: stage
- install -s -m 555=3D3D20
/wrkdirs/usr/ports/devel/ninja/work/ninja-1.11.1/ninja=3D3= D20
/wrkdirs/usr/ports/devel/ninja/work/stage/usr/loca= l/bin
- consumers fail to build due to corrupted bin/n= inja packaged
=
=3D3D20
devel/netsu= rf-buildsystem:
- phase: stage
=
-= mkdir -p=3D3D20
=3D3D
=3D
/wrkdirs/usr/ports/devel/n= etsurf-buildsystem/work/stage/usr/local/share/n
e=3D
=3D3D
tsurf-buildsystem/makefi= les=3D3D20
=3D3D
=3D
=
/wrkdirs/usr/ports/devel/netsurf-buildsystem= /work/stage/usr/local/share/n
e=3D
=3D3D=
=
tsurf-buildsystem/testtools
for M= in Makefile.top Makefile.tools Makefile.subdir =3D3D
Makefile.pkgconfig=3D3D20<= /span>
<= blockquote type=3D"cite">
Makefile.clang Makefile.gcc Makefile.norcroft Makefile.open64; do \=
=
cp makefiles/$M=3D3D20
=3D3D
=3D<= br>
= /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/n
e=3D
=3D3D
tsurf-buildsystem/makefiles/;=3D3D20
<= /blockquote>
\
done<= /span>
<= /blockquote>
- graphics/libnsgif fails to build due to NUL character= s in=3D3D20
=
Makefile.{clang,subdir}, causing nothing t= o link
=3D20
Summary:= I have problems building ports into packages
via poudriere-devel use despite being fully upd= ated/patched
<= /blockquote>
(as o= f when I started the experiment), never having enabled
block_cloning ( still using openzfs-2.= 1-freebsd ).
<= /blockquote>
=3D20=
=
In other words, I= can confirm other reports that have
been made.
=3D20
The details follow.
<= span>=3D20
=3D20<= /span>
<= blockquote type=3D"cite">
[Written as I was= working on setting up for the experiments
and then executing those experiments, adjusting as= I went
along.]
=3D20
I've run my own tests in a cont= ext that has never had the
=
zpool upgrade and that jump from before the openzfs import to
after the existing co= mmits for trying to fix openzfs on
FreeBSD. I report on the sequence of activities getting to=
=
the point of tes= ting as well.
=
=3D= 20
By personal po= licy I keep my (non-temporary) pool's compatible
=
with what the most recent ??.?-RELEASE suppo= rts, using
openzf= s-2.1-freebsd for now. The pools involved below have
=
never had a zpool upgrade from where the= y started. (I've no
pools that have ever had a zpool upgrade.)
=3D20
(Temporary pools are rare for me, such as this investigati= on.
But I'm not t= esting block_cloning or anything new this time.)
=
=3D20
I'll note that I use zfs for bectl, not for redundancy.= So
my evidence i= s more limited in that respect.
=3D20
The activities were done on a HoneyComb (16 Cortex-A72 cores).
<= /blockquote>
The system has and supports E= CC RAM, 64 GiBytes of RAM are
present.
<= span>=3D20
I star= ted by duplicating my normal zfs environment to an
=
external USB3 NVMe drive and adjusting the= host name and such
to produce the below. (Non-debug, although I do not strip
symbols.) :
=3D20
# uname -apKU
FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CUR= RENT #90 =3D3D
ma= in-n261544-cee09bda03c8-dirty: Wed Mar 15 20:25:49 PDT 2023   &nbs= p; =3D3D
=
=3D=
=
root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-sr= c/arm
6=3D
=3D3D
=
4.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400082 1400082
=3D20
I then did: git fetch, stash pu= sh ., merge --ff-only, stash apply . :
<= /blockquote>
my normal procedure. I then also applied the patch from= :
=3D20
https://github.com/openzfs= /zfs/pull/14739/files
<= span>=3D20
Then I= did: buildworld buildkernel, install them, and rebooted.
=3D20
The result was:
=3D20
# uname -apKU
=
FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #91 =3D3D<= /span>
<= blockquote type=3D"cite">
main-n262122-2ef2= c26f3f13-dirty: Wed Apr 12 19:23:35 PDT 2023     =3D3D
=
=3D
root@CA= 72_4c8G_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm
6=3D=
=
=3D3D
4.aar= ch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400086 1400086
=3D20
The later poudriere-devel based build of package= s from ports is
b= ased on:
=3D20
<= blockquote type=3D"cite">
# ~/fbsd-based-on-w= hat-commit.sh -C /usr/ports
4e94ac9eb97f (HEAD -> main, freebsd/main, freebsd/HEAD) =3D3D<= /span>
<= blockquote type=3D"cite">
devel/freebsd-gcc= 12: Bump to 12.2.0.
Author:     John Baldwin <jhb@FreeBSD.org>
Commit:   &= nbsp; John Baldwin <jhb@FreeBSD.org>
<= blockquote type=3D"cite">CommitDate: 2023-03-25 00:06:40 +0000<= br>
branch: main
<= /blockquote>
merge-base: 4e94ac9eb97fab165= 10b74ebcaa9316613182a72
<= blockquote type=3D"cite">
merge-base: CommitDate: 2023-03-25 00:06:40 +0000
n613214 (--first-parent --count for m= erge-base)
=3D20<= /span>
<= blockquote type=3D"cite">
poudriere attempt= ed to build 476 packages, starting
with pkg (in order to build the 56 that I explicitly
indicate that I want).= It is my normal set of ports.
The form of building is biased to allowing a high
load average compared to the nu= mber of hardware
= threads (same as cores here): each builder is allowed
to use the full count of hardware threa= ds. The build
=
=E2=82=AC=C3=8FL=E2=82=AC=E2=82= =AC=E2=82=AC=E2=82=AC=E2=80=B9=15 > > >> used USE_TMPFS=3D3D3D"d= ata" instead of the USE_TMPFS=3D3D3Dall I
normally use on the build m= achine involved.
= =3D20
And it prod= uced some random errors during the attempted
builds. A type of example that is easy to interp= ret
without furth= er exploration is:
=3D20
pkg_resourc= es.extern.packaging.requirements.InvalidRequirement: Parse
=3D
=3D3D
error at "'\x00\x00\x00\x00\x00\x00\x00\x00'": Expected W:(0-9A-Z= a-z)
    0
 = ;       da0p8     ONL= INE       0     0  &n= bsp;  0
=3D20
errors: No k= nown data errors
= =3D20
=3D20
=3D3D3D=3D3D3D=3D3D3D<= /span>
<= blockquote type=3D"cite">
Mark Millard
marklmi at yahoo.com<= /span>
<= blockquote type=3D"cite">
=3D20
<= /blockquote>
=3D20
<= /blockquote>
Let's try this again. Claws-mail didn't include the list address in= =3D
t= he=3D20
header. Trying to reply, again, using exmh instead.
<= /blockquote>
=3D20
=3D20
Did your pools s= uffer the EXDEV problem? The EXDEV also corrupted =3D
files.

As I reported, this was a jump from before the import
to as things are tonigh= t (here). So: NO, unless the
existing code as of tonight still has the EXDEV problem!

Prior to this experiment I'd not progressed an= y media
beyond: mai= n-n261544-cee09bda03c8-dirty Wed Mar 15 20:25:49.
<= blockquote type=3D"cite">
I think, without sufficient investiga= tion we risk jumping to
conclusions. I've taken an extre= mely cautious approach, rolling back
<= blockquote type=3D"cite">
snapshots (as much a= s possible, i.e. poudriere datasets) when EXDEV
corrupti= on was encountered.

Again= : nothing between main-n261544-cee09bda03c8-dirty and
main-n262122-2ef2c26f3f13-dirty was invol= ved at any stage.
<= /span>
<= blockquote type=3D"cite">
=3D20
<= blockquote type=3D"cite">
I did not rollback any snapshots in my MH m= ail directory. Rolling back
snapshots of my MH maildir w= ould result in loss of email. I have to
=
live with that c= orruption. Corrupted files in my outgoing sent email
=
dir= ectory remain:
=3D20
slippy$ ugrep -= cPa '\x00' ~/.Mail/note | grep -c :1=3D20
53
<= /blockquote>
slippy$=3D20
=3D20
There ar= e 53 corrupted files in my note log of 9913 emails. Those =3D
files
will ne= ver be fixed. They were corrupted by the EXDEV bug. Any new ZFS
=
or ZFS patches cannot retroactively remove the corruption from those=
files.
=3D20
Bu= t my poudriere files, because the snapshots were rolled back, were
"repaired" by the rolled back snapshots.
<= /blockquote>
=3D20=
I'm not convinced that there is presently active corrup= tion since
<= blockquote type=3D"cite">the problem has been fixed. I am convinced th= at whatever corruption
that was written at the time will= remain forever or until those files
<= blockquote type=3D"cite">
are deleted or repl= aced -- just like my email files written to disk at
<= /blockquote>
the t= ime.
<= /span>
<= blockquote type=3D"cite">
My test results and= procedure just do not fit your conclusion
that things are okay now if block_clonging is comple= tely avoided.

Admitting I'= m wrong: sending copies of my last reply to you back to myself,

again and agai= n, three times, I've managed to reproduce the corruption you
are talking about.

This em= ail itself was also corrupted. Below is what was sent. Good thing
multiple copies are saved by e= xmh.

Admitting I'm wrong: sending copies o= f my last reply to you back to myself,
again and again, three times, I've managed to reproduce t= he corruption you
ar= e talking about.

This email it= self was also corrupted. Below is what was sent. Good thing
multiple copies are saved by exmh.

Admitti= ng I'm wrong: sending copies of my last reply to you back to myself, =
again and again, three times, I've managed to reproduce the corrup= tion you
are talking about.

=46rom my previous email to you.

header. T= rying to reply:::::::::, again, using exmh instead.
 &= nbsp;            = ;        ^^^^^^^^^
H= ere it is, nine additional bytes of garbage. I've replaced the garbage

with colons because nulls mess up a lot of things, including cut= &paste.

In another instance about 500 b= ytes were removed. I can reproduce the
corruption at will n= ow.

The EXDEV patch is applied. Block_cloni= ng is disabled.

Somehow nulls and other gar= bage are inserted in the middle of emails after
the ZFS upg= rade.

Can you please try th= is patch:

Unfortunately I don=E2=80=99t see how th= is can happen with block cloning disabled.

--=  
Pawe=C5=82 Jakub Dawidek
= --Apple-Mail-2BA72446-819E-425D-8704-55E95D118510--