Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

From: Martin Matuska <mm_at_FreeBSD.org>
Date: Tue, 04 Apr 2023 10:35:51 UTC
I have opened a PR for this (includes patch):
https://github.com/openzfs/zfs/pull/14713

We still need to address the a_fsizetd problem.

Maybe

On 4. 4. 2023 7:22, Rick Macklem wrote:
> On Mon, Apr 3, 2023 at 10:07 PM Cy Schubert <Cy.Schubert@cschubert.com> wrote:
>> In message <CAM5tNy4p5UbAkujksyTR=J0tGuDf=Y6v2LuctK4P_p-H3PZpPg@mail.gmail.c
>> om>
>> , Rick Macklem writes:
>>> On Mon, Apr 3, 2023 at 9:00=E2=80=AFPM Cy Schubert <Cy.Schubert@cschubert.c=
>>> om> wrote:
>>>> In message <202304031513.333FD6qw014903@gitrepo.freebsd.org>, Martin
>>>> Matuska wr
>>>> ites:
>>>>> The branch main has been updated by mm:
>>>>>
>>>>> URL: https://cgit.FreeBSD.org/src/commit/?id=3D2a58b312b62f908ec92311d1=
>>> bd8536db
>>>>> aeb8e55b
>>>>>
>>>>> commit 2a58b312b62f908ec92311d1bd8536dbaeb8e55b
>>>>> Merge: b98fbf3781df 431083f75bdd
>>>>> Author:     Martin Matuska <mm@FreeBSD.org>
>>>>> AuthorDate: 2023-04-03 14:49:30 +0000
>>>>> Commit:     Martin Matuska <mm@FreeBSD.org>
>>>>> CommitDate: 2023-04-03 14:49:30 +0000
>>>>>
>>>>>      zfs: merge openzfs/zfs@431083f75
>>>>>
>>>>>      Notable upstream pull request merges:
>>>>>        #12194 Fix short-lived txg caused by autotrim
>>>>>        #13368 ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()
>>>>>        #13392 Implementation of block cloning for ZFS
>>>>>        #13741 SHA2 reworking and API for iterating over multiple impleme=
>>> ntatio
>>>>> ns
>>>>>        #14282 Sync thread should avoid holding the spa config write lock
>>>>>               when possible
>>>>>        #14283 txg_sync should handle write errors in ZIL
>>>>>        #14359 More adaptive ARC eviction
>>>>>        #14469 Fix NULL pointer dereference in zio_ready()
>>>>>        #14479 zfs redact fails when dnodesize=3Dauto
>>>>>        #14496 improve error message of zfs redact
>>>>>        #14500 Skip memory allocation when compressing holes
>>>>>        #14501 FreeBSD: don't verify recycled vnode for zfs control direc=
>>> tory
>>>>>        #14502 partially revert PR 14304 (eee9362a7)
>>>>>        #14509 Fix per-jail zfs.mount_snapshot setting
>>>>>        #14514 Fix data race between zil_commit() and zil_suspend()
>>>>>        #14516 System-wide speculative prefetch limit
>>>>>        #14517 Use rw_tryupgrade() in dmu_bonus_hold_by_dnode()
>>>>>        #14519 Do not hold spa_config in ZIL while blocked on IO
>>>>>        #14523 Move dmu_buf_rele() after dsl_dataset_sync_done()
>>>>>        #14524 Ignore too large stack in case of dsl_deadlist_merge
>>>>>        #14526 Use .section .rodata instead of .rodata on FreeBSD
>>>>>        #14528 ICP: AES-GCM: Refactor gcm_clear_ctx()
>>>>>        #14529 ICP: AES-GCM: Unify gcm_init_ctx() and gmac_init_ctx()
>>>>>        #14532 Handle unexpected errors in zil_lwb_commit() without ASSER=
>>> T()
>>>>>        #14544 icp: Prevent compilers from optimizing away memset()
>>>>>               in gcm_clear_ctx()
>>>>>        #14546 Revert zfeature_active() to static
>>>>>        #14556 Remove bad kmem_free() oversight from previous zfsdev_stat=
>>> e_list
>>>>>               patch
>>>>>        #14563 Optimize the is_l2cacheable functions
>>>>>        #14565 FreeBSD: zfs_znode_alloc: lock the vnode earlier
>>>>>        #14566 FreeBSD: fix false assert in cache_vop_rmdir when replayin=
>>> g ZIL
>>>>>        #14567 spl: Add cmn_err_once() to log a message only on the first=
>>>   call
>>>>>        #14568 Fix incremental receive silently failing for recursive sen=
>>> ds
>>>>>        #14569 Restore ASMABI and other Unify work
>>>>>        #14576 Fix detection of IBM Power8 machines (ISA 2.07)
>>>>>        #14577 Better handling for future crypto parameters
>>>>>        #14600 zcommon: Refactor FPU state handling in fletcher4
>>>>>        #14603 Fix prefetching of indirect blocks while destroying
>>>>>        #14633 Fixes in persistent error log
>>>>>        #14639 FreeBSD: Remove extra arc_reduce_target_size() call
>>>>>        #14641 Additional limits on hole reporting
>>>>>        #14649 Drop lying to the compiler in the fletcher4 code
>>>>>        #14652 panic loop when removing slog device
>>>>>        #14653 Update vdev state for spare vdev
>>>>>        #14655 Fix cloning into already dirty dbufs
>>>>>        #14678 Revert "Do not hold spa_config in ZIL while blocked on IO"
>>>>>
>>>>>      Obtained from:  OpenZFS
>>>>>      OpenZFS commit: 431083f75bdd3efaee992bdd672625ec7240d252
>>>> Another problem related to copy_file_range() is the following example.
>>>>
>>>> slippy$ df -h build/make/Makefile .
>>>> Filesystem        Size    Used   Avail Capacity  Mounted on
>>>> t/wrkdir/amd64     52G     53M     52G     0%    /export/wrkdir/amd64
>>>> t/wrkdir/amd64     52G     53M     52G     0%    /export/wrkdir/amd64
>>>> slippy$ cp build/make/Makefile .
>>>> cp: build/make/Makefile: Cross-device link
>>>> slippy$
>>>>
>>>> And,
>>>>
>>>> slippy$ cp y4menc.h foobar
>>>> cp: y4menc.h: Cross-device link
>>>> slippy$
>>>>
>>>> But the following works because /tmp is mfs and /var/tmp is ufs.
>>>>
>>>> slippy$ cp y4menc.h /tmp
>>>> slippy$
>>>> slippy$ cp y4menc.h /var/tmp
>>>> slippy$
>>>>
>>>> Copying files from one zpool to a dataset in another zpool also works as
>>>> does copying files from different datasets on the same zpool. Only copyin=
>>> g
>>>> files from/to the same dataset results in cross device link.
>>>>
>>> zfs_copy_file_range() will only be called if the source and destination
>>> both exist on the same mount point. (I'm guessing that's what you mean
>>> by "same dataset".)
>> Correct.
>>
>> The long term solution to the problem (for users) would be to enable
>> block_cloning. One can test this by testing, checkpoint the pool, enable
>> block_cloning, problem goes away, export and import the pool with
>> --rewind-to-checkpoint to look at the regression again.
>>
>> As to why we see a Cross-device link,
>>
>> dtrace: script './dtrace.d' matched 1 probe
>> CPU     ID                    FUNCTION:NAME
>>    3  26104        vn_copy_file_range:return int64_t 0x12
>>
>> 0x12 == EXDEV.
>>
>> As to why, zfs_clone_range() is called unconditionally at
>> zfs/zfs_vnops_os.c:6292. If the feature is not enabled zfs_clone_range()
>> sets the return code to EXDEV at line 1083 of zfs_vnops.c. Therefore
>> block_cloning MUST be enabled.
>>
>> To reiterate, the long term solution would be to enable block_cloning. For
>> those who either wish or need to defer enabling of block_cloning, because
>> the zpool is occasionally imported by older versions of freebsd requiring
>> write acess, or other reason.
>>
> If there is a way to test to see if block cloning is enabled at the beginning
> of zfs_copy_file_range(), it should just call vn_generic_copy_file_range()
> if it is not enabled (or something like that).
>
> rick
>
>> --
>> Cheers,
>> Cy Schubert <Cy.Schubert@cschubert.com>
>> FreeBSD UNIX:  <cy@FreeBSD.org>   Web:  https://FreeBSD.org
>> NTP:           <cy@nwtime.org>    Web:  https://nwtime.org
>>
>>                          e^(i*pi)+1=0
>>
>>
>>