Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics

From: Tomoaki AOKI <junchoon_at_dec.sakura.ne.jp>
Date: Sat, 09 Sep 2023 02:30:06 UTC
On Fri, 8 Sep 2023 17:03:07 -0700
Mark Millard <marklmi@yahoo.com> wrote:

> On Sep 8, 2023, at 15:30, Martin Matuska <mm@FreeBSD.org> wrote:
> 
> > I can confirm that the patch fixes the panic caused by the provided script on my test systems.
> > Mark, would it be possible to try poudriere on your system with a patched kernel?
> 
> . . .
> 
> On 9. 9. 2023 0:09, Alexander Motin wrote:
> > On 08.09.2023 09:52, Martin Matuska wrote:
> >> . . .
> > 
> > Thank you, Martin.  I was able to reproduce the issue with your script and found the cause.
> > 
> > I first though the issue is triggered by the `cp`, but it appeared to be triggered by `cat`.  It also got copy_file_range() support, but later than `cp`.  That is probably why it slipped through testing.  This patch fixes it for me: https://github.com/openzfs/zfs/pull/15251 .
> > 
> > Mark, could you please try the patch?
> 
> If all goes well, this will end up reporting that the
> poudriere bulk -a is still running but has gotten past,
> say, 320+ port->package builds finished (so: more than
> double observed so far for the panic context). Later
> would be a report with a larger figure. A normal run
> I might let go for 6000+ ports and 10 hr or so.
> 
> Notes as I go . . .
> 
> Patch applied, built, and installed to the test media.
> Also, booted:
> 
> # uname -apKU
> FreeBSD amd64-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT amd64 1500000 #75 main-n265228-c9315099f69e-dirty: Thu Sep  7 13:28:47 PDT 2023     root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-dbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-DBG amd64 amd64 1500000 1500000
> 
> Note that this is with a debug kernel (-dbg- in path and -DBG in
> the GENERIC* name). Also, the vintage of what it is based on has:
> 
> git: 969071be938c - main - vfs: copy_file_range() between multiple mountpoints of the same fs type
> 
> The usual sort of sequencing previously reported to get to this
> point. Media update starts with the rewind to the checkpoint in
> hopes of avoiding oddities from the later failure.
> 
> . . . :
> 
> [main-amd64-bulk_a-default] [2023-09-08_16h31m51s] [parallel_build:] Queued: 34588 Built: 414   Failed: 0     Skipped: 39    Ignored: 335   Fetched: 0     Tobuild: 33800  Time: 00:30:41
> 
> 
> So 414 and and still building.
> 
> More later. (It may be a while.)
> 
> ===
> Mark Millard
> marklmi at yahoo.com

Would it planned to be MFC'ed to stable/14, and then releng/14.0 once
MFV'ed to main?

Regards.

-- 
Tomoaki AOKI    <junchoon@dec.sakura.ne.jp>