Re: git: 8ee579abe09e - main - zfs: fall back if block_cloning feature is disabled

From: Cy Schubert <Cy.Schubert_at_cschubert.com>
Date: Tue, 04 Apr 2023 15:02:05 UTC
In message <CAGudoHEvGDUQkYe8LwUXgTZZa+6DAFXVtspCX-Mn2egDO2oc_w@mail.gmail.c
om>
, Mateusz Guzik writes:
> On 4/4/23, Cy Schubert <Cy.Schubert@cschubert.com> wrote:
> > In message <202304041145.334Bjx6l035872@gitrepo.freebsd.org>, Martin
> > Matuska wr
> > ites:
> >> The branch main has been updated by mm:
> >>
> >> URL:
> >> https://cgit.FreeBSD.org/src/commit/?id=8ee579abe09ec1fe15c588fc9a08370b
> >> 83b81cd6
> >>
> >> commit 8ee579abe09ec1fe15c588fc9a08370b83b81cd6
> >> Author:     Martin Matuska <mm@FreeBSD.org>
> >> AuthorDate: 2023-04-04 11:40:41 +0000
> >> Commit:     Martin Matuska <mm@FreeBSD.org>
> >> CommitDate: 2023-04-04 11:43:34 +0000
> >>
> >>     zfs: fall back if block_cloning feature is disabled
> >>
> >>     If block_cloning is disabled, or other errors from zfs_clone_range()
> >>     return an EXDEV we should fall back to vn_generic_copy_file_range().
> >>
> >>     This fixes issues when copying files on the same dataset with
> >>     block_cloning disabled.
> >>
> >>     Upstreamed as pull request to OpenZFS.
> >>
> >>     Reviewed by:    Mateusz Guzik <mjguzik@gmail.com>
> >>     OpenZFS pull request:   14713
> >> ---
> >>  .../openzfs/module/os/freebsd/zfs/zfs_vnops_os.c        | 17
> >> ++++++++++-----
> >> --
> >>  1 file changed, 10 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> >> b/sys/c
> >> ontrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> >> index 97429b360a36..2cd1d27e37bc 100644
> >> --- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> >> +++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> >> @@ -6243,13 +6243,6 @@ zfs_freebsd_copy_file_range(struct
> >> vop_copy_file_range
> >> _args *ap)
> >>  	int error;
> >>  	uint64_t len = *ap->a_lenp;
> >>
> >> -	/*
> >> -	 * TODO: If offset/length is not aligned to recordsize, use
> >> -	 * vn_generic_copy_file_range() on this fragment.
> >> -	 * It would be better to do this after we lock the vnodes, but then we
> >> -	 * need something else than vn_generic_copy_file_range().
> >> -	 */
> >> -
> >>  	/* Lock both vnodes, avoiding risk of deadlock. */
> >>  	do {
> >>  		mp = NULL;
> >> @@ -6300,6 +6293,16 @@ unlock:
> >>  	if (mp != NULL)
> >>  		vn_finished_write(mp);
> >>
> >> +	/*
> >> +	 * Fall back if block_cloning feature is disabled
> >> +	 * or other EXDEV failures from zfs_vnops.c
> >> +	 */
> >> +	if (error == EXDEV) {
> >> +		error = vn_generic_copy_file_range(ap->a_invp, ap->a_inoffp,
> >> +			    ap->a_outvp, ap->a_outoffp, ap->a_lenp, ap->a_flags
> >> ,
> >> +			    ap->a_incred, ap->a_outcred, ap->a_fsizetd);
> >> +	}
> >> +
> >>  	return (error);
> >>  }
> >>
> >>
> >
> > This is too late to fall back. On Rick's suggestion the following makes the
> >
> > determination at
> > zfs_freebsd_copy_file_range() entry much earlier.
> >
>
> It's not too late, but I agree it is faster to bail out early.
>
> The proposed patch adds a condition which *differs* from the one in
> zfs_clone_range:
>         if (dmu_objset_spa(inos) != dmu_objset_spa(outos)) {
>                 zfs_exit_two(inzfsvfs, outzfsvfs, FTAG);
>                 return (SET_ERROR(EXDEV));
>         }
>
> ... meaning with the proposed patch the routine can still fail with
> EXDEV, making zfs_freebsd_copy_file_range also do it, which must not
> happen.
>
> That aside the code looks rather suspicious for the case where target
> and source vnode are the same. iow more work is needed here.
>
> As the vnode is unlocked, you *can't* safely access zfsvfs_t
> *outzfsvfs = ZTOZSB(outzp); in that spot in this manner -- a forced
> unmount at the same time can free it.
>
> iow this patch does *NOT* work.
>
> With the committed variant the situation is damage controlled enough
> that there is time to sort it out correctly.
>
> > diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> > b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
[...]

Gotcha. What you're suggesting is something more like this. Check for 
block_cloning and also retry should zfs_clone_range() return EXDEV for any 
other reason.

diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c 
b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
index baa2ee5b3824..60916bfcfbc3 100644
--- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
+++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
@@ -6239,6 +6239,9 @@ zfs_freebsd_copy_file_range(struct 
vop_copy_file_range_args *ap)
 	struct vnode *invp = ap->a_invp;
 	struct vnode *outvp = ap->a_outvp;
 	struct mount *mp;
+	znode_t *outzp;
+	zfsvfs_t *outzfsvfs;
+	objset_t *outos;
 	struct uio io;
 	int error;
 	uint64_t len = *ap->a_lenp;
@@ -6276,6 +6279,19 @@ zfs_freebsd_copy_file_range(struct 
vop_copy_file_range_args *ap)
 	} while (error == 0);
 	if (error != 0)
 		return (error);
+
+	outzp = VTOZ(ap->a_outvp);
+	outzfsvfs = ZTOZSB(outzp);
+	outos = outzfsvfs->z_os;
+
+        if (!spa_feature_is_enabled(dmu_objset_spa(outos),
+            SPA_FEATURE_BLOCK_CLONING)) {
+		error = vn_generic_copy_file_range(ap->a_invp, ap->a_inoffp,
+			ap->a_outvp, ap->a_outoffp, ap->a_lenp, ap->a_flags,
+			ap->a_incred, ap->a_outcred, ap->a_fsizetd);
+                goto unlock;
+        }
+
 #ifdef MAC
 	error = mac_vnode_check_write(curthread->td_ucred, ap->a_outcred,
 	    outvp);
@@ -6291,6 +6307,11 @@ zfs_freebsd_copy_file_range(struct 
vop_copy_file_range_args *ap)
 
 	error = zfs_clone_range(VTOZ(invp), ap->a_inoffp, VTOZ(outvp),
 	    ap->a_outoffp, &len, ap->a_outcred);
+
+	if (error == EXDEV)
+		error = vn_generic_copy_file_range(ap->a_invp, ap->a_inoffp,
+			ap->a_outvp, ap->a_outoffp, ap->a_lenp, ap->a_flags,
+			ap->a_incred, ap->a_outcred, ap->a_fsizetd);
 	*ap->a_lenp = (size_t)len;
 
 unlock:


-- 
Cheers,
Cy Schubert <Cy.Schubert@cschubert.com>
FreeBSD UNIX:  <cy@FreeBSD.org>   Web:  https://FreeBSD.org
NTP:           <cy@nwtime.org>    Web:  https://nwtime.org

			e^(i*pi)+1=0