Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics

From: Mark Millard <marklmi_at_yahoo.com>
Date: Sat, 09 Sep 2023 01:19:09 UTC
On Sep 8, 2023, at 17:03, Mark Millard <marklmi@yahoo.com> wrote:

> On Sep 8, 2023, at 15:30, Martin Matuska <mm@FreeBSD.org> wrote:
> 
>> I can confirm that the patch fixes the panic caused by the provided script on my test systems.
>> Mark, would it be possible to try poudriere on your system with a patched kernel?
> 
> . . .
> 
> On 9. 9. 2023 0:09, Alexander Motin wrote:
>> On 08.09.2023 09:52, Martin Matuska wrote:
>>> . . .
>> 
>> Thank you, Martin.  I was able to reproduce the issue with your script and found the cause.
>> 
>> I first though the issue is triggered by the `cp`, but it appeared to be triggered by `cat`.  It also got copy_file_range() support, but later than `cp`.  That is probably why it slipped through testing.  This patch fixes it for me: https://github.com/openzfs/zfs/pull/15251 .
>> 
>> Mark, could you please try the patch?
> 
> If all goes well, this will end up reporting that the
> poudriere bulk -a is still running but has gotten past,
> say, 320+ port->package builds finished (so: more than
> double observed so far for the panic context). Later
> would be a report with a larger figure. A normal run
> I might let go for 6000+ ports and 10 hr or so.
> 
> Notes as I go . . .
> 
> Patch applied, built, and installed to the test media.
> Also, booted:
> 
> # uname -apKU
> FreeBSD amd64-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT amd64 1500000 #75 main-n265228-c9315099f69e-dirty: Thu Sep  7 13:28:47 PDT 2023     root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-dbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-DBG amd64 amd64 1500000 1500000
> 
> Note that this is with a debug kernel (-dbg- in path and -DBG in
> the GENERIC* name). Also, the vintage of what it is based on has:
> 
> git: 969071be938c - main - vfs: copy_file_range() between multiple mountpoints of the same fs type
> 
> The usual sort of sequencing previously reported to get to this
> point. Media update starts with the rewind to the checkpoint in
> hopes of avoiding oddities from the later failure.
> 
> . . . :
> 
> [main-amd64-bulk_a-default] [2023-09-08_16h31m51s] [parallel_build:] Queued: 34588 Built: 414   Failed: 0     Skipped: 39    Ignored: 335   Fetched: 0     Tobuild: 33800  Time: 00:30:41
> 
> 
> So 414 and and still building.
> 
> More later. (It may be a while.)
> 

[main-amd64-bulk_a-default] [2023-09-08_16h31m51s] [parallel_build:] Queued: 34588 Built: 2013  Failed: 2     Skipped: 179   Ignored: 335   Fetched: 0     Tobuild: 32059  Time: 01:42:47

and still going. (FYI: The failures are expected.)

After a while I might stop it and start over with a non-debug
kernel installed instead.

===
Mark Millard
marklmi at yahoo.com