Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics

From: Mark Millard <marklmi_at_yahoo.com>
Date: Sat, 09 Sep 2023 04:54:14 UTC
On Sep 8, 2023, at 18:19, Mark Millard <marklmi@yahoo.com> wrote:

> On Sep 8, 2023, at 17:03, Mark Millard <marklmi@yahoo.com> wrote:
> 
>> On Sep 8, 2023, at 15:30, Martin Matuska <mm@FreeBSD.org> wrote:
>> 
>>> I can confirm that the patch fixes the panic caused by the provided script on my test systems.
>>> Mark, would it be possible to try poudriere on your system with a patched kernel?
>> 
>> . . .
>> 
>> On 9. 9. 2023 0:09, Alexander Motin wrote:
>>> On 08.09.2023 09:52, Martin Matuska wrote:
>>>> . . .
>>> 
>>> Thank you, Martin.  I was able to reproduce the issue with your script and found the cause.
>>> 
>>> I first though the issue is triggered by the `cp`, but it appeared to be triggered by `cat`.  It also got copy_file_range() support, but later than `cp`.  That is probably why it slipped through testing.  This patch fixes it for me: https://github.com/openzfs/zfs/pull/15251 .
>>> 
>>> Mark, could you please try the patch?
>> 
>> If all goes well, this will end up reporting that the
>> poudriere bulk -a is still running but has gotten past,
>> say, 320+ port->package builds finished (so: more than
>> double observed so far for the panic context). Later
>> would be a report with a larger figure. A normal run
>> I might let go for 6000+ ports and 10 hr or so.
>> 
>> Notes as I go . . .
>> 
>> Patch applied, built, and installed to the test media.
>> Also, booted:
>> 
>> # uname -apKU
>> FreeBSD amd64-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT amd64 1500000 #75 main-n265228-c9315099f69e-dirty: Thu Sep  7 13:28:47 PDT 2023     root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-dbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-DBG amd64 amd64 1500000 1500000
>> 
>> Note that this is with a debug kernel (-dbg- in path and -DBG in
>> the GENERIC* name). Also, the vintage of what it is based on has:
>> 
>> git: 969071be938c - main - vfs: copy_file_range() between multiple mountpoints of the same fs type
>> 
>> The usual sort of sequencing previously reported to get to this
>> point. Media update starts with the rewind to the checkpoint in
>> hopes of avoiding oddities from the later failure.
>> 
>> . . . :
>> 
>> [main-amd64-bulk_a-default] [2023-09-08_16h31m51s] [parallel_build:] Queued: 34588 Built: 414   Failed: 0     Skipped: 39    Ignored: 335   Fetched: 0     Tobuild: 33800  Time: 00:30:41
>> 
>> 
>> So 414 and and still building.
>> 
>> More later. (It may be a while.)
>> 
> 
> [main-amd64-bulk_a-default] [2023-09-08_16h31m51s] [parallel_build:] Queued: 34588 Built: 2013  Failed: 2     Skipped: 179   Ignored: 335   Fetched: 0     Tobuild: 32059  Time: 01:42:47
> 
> and still going. (FYI: The failures are expected.)
> 
> After a while I might stop it and start over with a non-debug
> kernel installed instead.

I did ^C after 2.5 hr (with 2447 built):

^C[02:30:05] Error: Signal SIGINT caught, cleaning up and exiting
[main-amd64-bulk_a-default] [2023-09-08_16h31m51s] [sigint:] Queued: 34588 Built: 2447  Failed: 5     Skipped: 226   Ignored: 335   Fetched: 0     Tobuild: 31575  Time: 02:29:59
[02:30:05] Logs: /usr/local/poudriere/data/logs/bulk/main-amd64-bulk_a-default/2023-09-08_16h31m51s
[02:30:05] Cleaning up
[02:38:04] Unmounting file systems
Exiting with status 1

I'll switch it over to a non-debug kernel and, probably, world
and setup/run another test.

. . . (time goes by) . . .

Hmm. This did not get sent when I wrote the above. FYI, non-debug
test status:

[main-amd64-bulk_a-default] [2023-09-08_19h51m52s] [parallel_build:] Queued: 34588 Built: 2547  Failed: 5     Skipped: 239   Ignored: 335   Fetched: 0     Tobuild: 31462  Time: 01:59:58

I may let it run overnight.


===
Mark Millard
marklmi at yahoo.com