Re: Is 14.0 to released based on 0 for sysctl vfs.zfs.bclone_enabled ?

From: Martin Matuska <mm_at_FreeBSD.org>
Date: Fri, 10 Nov 2023 14:31:38 UTC
Hi Xin,

since when have you been using block cloning on the system? Is it 
possible that there is already corrupted block-cloned data from the 
past? Is everything on one dataset or are you using multiple datasets 
for /usr/src and /usr/obj?

Best regards,
mm

On 10. 11. 2023 8:04, Xin Li wrote:
> On 2023-11-05 16:34, Martin Matuska wrote:
>> OpenZFS 2.2.0 in FreeBSD 14 fully supports block cloning. You can 
>> work with pools that have feature@block_cloning enabled.
>> The sysctl variable vfs.zfs.bclone_enabled affects the behavior of 
>> zfs_clone_range() which is called by copy_file_range(). When it is 
>> set to 0, zfs_clone_range() does not do block cloning.
>> If it is set to anything else than 0, zfs_clone_range() does block 
>> cloning (if all conditions are met - same ZFS pool, correct data 
>> alignment, etc.).
>>
>> In FreeBSD-main, this tunable is enabled and I plan to enable it in 
>> stable/14 somewhere around December 11, 2023.
>>
>> As of today I personally use block cloning on all my systems.
>
> I'd like to share a different data point.  It still panics on my 
> storage (running -CURRENT about a week ago) when enabled and can be 
> triggered by "make buildworld buildkernel".  I wasn't able to capture 
> earlier coredump until the most recent one, which panicked with:
>
>
> cpuid = 2
> time = 1699593456
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
> 0xfffffe022f2bd7e0
> vpanic() at vpanic+0x132/frame 0xfffffe022f2bd910
> spl_panic() at spl_panic+0x3a/frame 0xfffffe022f2bd970
> dmu_brt_clone() at dmu_brt_clone+0x555/frame 0xfffffe022f2bd9e0
> zfs_clone_range() at zfs_clone_range+0xa4c/frame 0xfffffe022f2bdbb0
> zfs_freebsd_copy_file_range() at 
> zfs_freebsd_copy_file_range+0x18a/frame 0xfffffe022f2bdc30
> vn_copy_file_range() at vn_copy_file_range+0x163/frame 0xfffffe022f2bdce0
> kern_copy_file_range() at kern_copy_file_range+0x380/frame 
> 0xfffffe022f2bddb0
> sys_copy_file_range() at sys_copy_file_range+0x78/frame 
> 0xfffffe022f2bde00
> amd64_syscall() at amd64_syscall+0x153/frame 0xfffffe022f2bdf30
> fast_syscall_common() at fast_syscall_common+0xf8/frame 
> 0xfffffe022f2bdf30
> --- syscall (569, FreeBSD ELF64, copy_file_range), rip = 
> 0x7fbb2da4ada, rsp = 0x7fbb02c5d48, rbp = 0x7fbb02c61e0 ---
> Uptime: 2h32m27s
> Dumping 7800 out of 32696 
> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
>
> #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
> #1  doadump (textdump=textdump@entry=1) at 
> /usr/src/sys/kern/kern_shutdown.c:405
> #2  0xffffffff80694480 in kern_reboot (howto=260) at 
> /usr/src/sys/kern/kern_shutdown.c:526
> #3  0xffffffff8069497f in vpanic (fmt=0xffffffff82603415 "VERIFY3(nbps 
> == numbufs) failed (%llu == %llu)\n", ap=ap@entry=0xfffffe022f2bd950) 
> at /usr/src/sys/kern/kern_shutdown.c:970
> #4  0xffffffff8232999a in spl_panic (file=<optimized out>, 
> func=<optimized out>, line=<unavailable>, fmt=<unavailable>) at 
> /usr/src/sys/contrib/openzfs/module/os/freebsd/spl/spl_misc.c:103
> #5  0xffffffff823a6605 in dmu_brt_clone 
> (os=os@entry=0xfffff800c5ce4000, object=<optimized out>, 
> offset=offset@entry=0, length=length@entry=207477, 
> tx=tx@entry=0xfffff8071a108d00, bps=bps@entry=0xfffffe01e218c000, 
> nbps=2, replay=0)
>     at /usr/src/sys/contrib/openzfs/module/zfs/dmu.c:2303
> #6  0xffffffff8250f67c in zfs_clone_range (inzp=0xfffff804416ac000, 
> inoffp=0xfffff800b81cb048, outzp=0xfffff806f58f03a0, 
> outoffp=0xfffff800b8063048, lenp=lenp@entry=0xfffffe022f2bdbf0, 
> cr=0xfffff8000a6fe600)
>     at /usr/src/sys/contrib/openzfs/module/zfs/zfs_vnops.c:1326
> #7  0xffffffff8234b3ba in zfs_freebsd_copy_file_range 
> (ap=0xfffffe022f2bdc48) at 
> /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c:6294
> #8  0xffffffff8079f443 in VOP_COPY_FILE_RANGE 
> (invp=0xfffff804416cb1c0, inoffp=0xfffff800b81cb048, 
> outvp=0xfffff806f51d3380, outoffp=0xfffff800b8063048, 
> lenp=0xfffffe022f2bdd48, incred=0xfffff8000a6fe600, flags=<optimized 
> out>,
>     outcred=<optimized out>, fsizetd=<optimized out>) at 
> ./vnode_if.h:2385
> #9  vn_copy_file_range (invp=invp@entry=0xfffff804416cb1c0, 
> inoffp=inoffp@entry=0xfffff800b81cb048, 
> outvp=outvp@entry=0xfffff806f51d3380, 
> outoffp=outoffp@entry=0xfffff800b8063048, 
> lenp=lenp@entry=0xfffffe022f2bdd48, flags=flags@entry=0,
>     incred=0xfffff8000a6fe600, outcred=0xfffff8000a6fe600, 
> fsize_td=0xfffffe022925b3a0) at /usr/src/sys/kern/vfs_vnops.c:3087
> #10 0xffffffff8079a070 in kern_copy_file_range 
> (td=td@entry=0xfffffe022925b3a0, infd=<optimized out>, 
> inoffp=0xfffff800b81cb048, inoffp@entry=0x0, outfd=<optimized out>, 
> outoffp=0xfffff800b8063048, outoffp@entry=0x0, len=9223372036854775807,
>     flags=0) at /usr/src/sys/kern/vfs_syscalls.c:4973
> #11 0xffffffff8079a178 in sys_copy_file_range (td=0xfffffe022925b3a0, 
> uap=0xfffffe022925b7a0) at /usr/src/sys/kern/vfs_syscalls.c:5011
> #12 0xffffffff80a97aa3 in syscallenter (td=0xfffffe022925b3a0) at 
> /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:188
> #13 amd64_syscall (td=0xfffffe022925b3a0, traced=0) at 
> /usr/src/sys/amd64/amd64/trap.c:1194
> #14 <signal handler called>
> #15 0x000007fbb2da4ada in ?? ()
>
>
> and disabling bclone does appear to allow me to finish buildworld / 
> buildkernel.
>
> The pool didn't have redaction_list_spill enabled.
>
> The ASSERT3U(nbps, ==, numbufs); in dmu_brt_clone was added when block 
> clone is first implemented.
>
> It seems that I am the only person who is seeing this as of today.  It 
> seems that block clone was indeed being used for some data:
>
> saturn  bcloneused 1.18M                          -
> saturn  bclonesaved 1.21M                          -
> saturn  bcloneratio 2.02x                          -
>
> The pool have dedup enabled for some datasets.
>
> Any suggestions?  (In extreme cases I can recreate the storage pool 
> from backup or copy the data somewhere else, then recreate the pool, 
> then copy data back, but I'd like to avoid that if possible)
>
> Cheers,
>
>
>>
>> mm
>>
>> On 04/11/2023 13:35, Mark Millard wrote:
>>> On Nov 4, 2023, at 04:38, Mike Karels <mike@karels.net> wrote:
>>>
>>>> On 4 Nov 2023, at 4:01, Ronald Klop wrote:
>>>>
>>>>> On 11/4/23 02:39, Mark Millard wrote:
>>>>>> It looks to me like releng/14.0 (as of 14.0-RC4) still has:
>>>>>>
>>>>>> int zfs_bclone_enabled;
>>>>>> SYSCTL_INT(_vfs_zfs, OID_AUTO, bclone_enabled, CTLFLAG_RWTUN,
>>>>>> &zfs_bclone_enabled, 0, "Enable block cloning");
>>>>>>
>>>>>> leaving block cloning effectively disabled by default, no
>>>>>> matter what the pool has enabled.
>>>>>>
>>>>>> https://www.freebsd.org/releases/14.0R/relnotes/ also reports:
>>>>>>
>>>>>> QUOTE
>>>>>> OpenZFS has been upgraded to version 2.2. New features include:
>>>>>> •
>>>>>> block cloning, which allows shallow copies of blocks in file 
>>>>>> copies. This is optional, and disabled by default; it can be 
>>>>>> enabled with sysctl vfs.zfs.bclone_enabled=1.
>>>>>> END QUOTE
>>>>>>
>>>>>
>>>>> I think this answers your question in the subject.
>>>> I think so too (and I wrote that text).
>>> Thanks for the confirmation of the final intent.
>>>
>>> I believe this makes:
>>>
>>> QUOTE
>>> author Brian Behlendorf <behlendorf1@llnl.gov> 2023-05-25 20:53:08 
>>> +0000
>>> committer GitHub <noreply@github.com> 2023-05-25 20:53:08 +0000
>>> commit 91a2325c4a0fbe01d0bf212e44fa9d85017837ce (patch)
>>> tree dd01dfce6aeef357ade1775acf18aade535c6271
>>> . . .
>>> Update compatibility.d files
>>>
>>> Add an openzfs-2.2 compatibility file for the next release. Edon-R 
>>> support has been enabled for FreeBSD removing the need for different 
>>> FreeBSD and Linux files. Symlinks for the -linux and -freebsd names 
>>> are created for any scripts expecting that convention. Additionally, 
>>> a symlink for ubunutu-22.04 was added. Signed-off-by: Brian 
>>> Behlendorf <behlendorf1@llnl.gov> Closes #14833
>>> END QUOTE
>>>
>>> technically incorrect in that compatibility.d/openzfs-2.2-freebsd
>>> should be distinct in content from compatibility.d/openzfs-2.2 so
>>> that block cloning would not be enabled.
>>>
>>>
>>>>>> Just curiousity on my part about the default completeness of
>>>>>> openzfs-2.2 support, not an objection either way.
>>>>>>
>>>>>
>>>>> I haven't seen new issues with block cloning in the last few weeks 
>>>>> mentioned on the mailing lists. All known issues are fixed AFAIK.
>>>>> But I can imagine that the risk+effect ratio of data corruption is 
>>>>> seen as a bit too high for a 14.0 release for this particular 
>>>>> feature. That does not diminish the rest of the completeness of 
>>>>> openzfs-2.2.
>>>>>
>>>>> NB: I'm not involved in developing openzfs or the decision making 
>>>>> in the release. Just repeating what I read on the lists.
>>>> There was another block cloning fix in 14.0-RC4; see the commit log.
>>>> Maybe there will be no more issues, but it seems that corner cases 
>>>> were
>>>> still being found recently.
>>> Looks like I'll stay at openzfs-2.1 pool features until there is
>>> a release that no longer has the default status:
>>>
>>> 0 for sysctl vfs.zfs.bclone_enabled
>>>
>>> I use main [so: 15 now] but only enable openzfs-2.* pool features
>>> supported by default on some FreeBSD release, that has an accurate
>>> compatibility.d/openzfs-2.*-freebsd file.
>>>
>>> ===
>>> Mark Millard
>>> marklmi at yahoo.com
>>>
>>>
>>
>