Re: RFC: Does ZFS block cloning do this?

From: Alan Somers <asomers_at_freebsd.org>
Date: Thu, 07 Aug 2025 17:22:41 UTC
On Thu, Aug 7, 2025 at 8:32 AM Rick Macklem <rick.macklem@gmail.com> wrote:

> On Wed, Aug 6, 2025 at 9:46 AM Rick Macklem <rick.macklem@gmail.com>
> wrote:
> >
> > On Wed, Aug 6, 2025 at 9:28 AM Alexander Motin <mav@freebsd.org> wrote:
> > >
> > > Hi Rick,
> > >
> > > On 8/6/25 11:54, Rick Macklem wrote:
> > > > The difference for NFSv4.2 is that CLONE cannot return with partial
> completion.
> > > > (It assumes that a CLONE of any size will complete quickly enough
> for an RPC.
> > > > Although there is no fixed limit, most assume an RPC reply should
> happen in
> > > > 1-2sec at most. For COPY, the server can return with only part of the
> > > > copy done.)
> > > > It also includes alignment restrictions for the byte offsets.
> > > >
> > > > There is also the alignment restriction on CLONE. There doesn't seem
> to be
> > > > an alignment restriction on zfs_clone_range(), but maybe it is
> buried inside it?
> > > > I think adding yet another pathconf name to get the alignment
> requirement and
> > > > whether or not the file system supports it would work without any
> VOP change.
> > >
> > > The semantics you describe looks similar to Linux FICLONE/FICLONERANGE
> > > calls, that got adopted there before copy_file_range().  IIRC those
> > > effectively mean -- clone the file or its range as requested or fail.
> I
> > > am not sure why some people prefer those calls, explicitly not allowing
> > > fallback to copy, but theere are some, for example Veeam backup fails
> if
> > > ZFS rejects the cloning request for any reason.  For Linux ZFS has a
> > > separate code (see zpl_remap_file_range() and respective VFS calls)
> > > wrapping around block cloning to implement this semantics.  FreeBSD
> does
> > > not have the equivalent at this point, but it would be trivial to add,
> > > if we really need those VOPs.
> > For NFSv4.2 (which I suspect was modelled after what Linux does) the
> > difference is the ability to complete the entire "copy" within 1-2sec
> under
> > normal circumstances.
> > --> The NFSv4.2 CLONE operation requires this.
> > whereas for the NFSv4.2 COPY
> > --> It is allowed to return after a partial completion to adhere to the
> 1-2sec
> >       rule. This probably does not affect ZFS, but it is needed for
> > the "in general"
> >       UFS case.
> >
> > There may be no difference needed for zfs_copy_file_range(). So long as
> it
> > never returns after a partial completion. If it does return after
> > partial completion,
> > a flag would indicate "must complete it".
> >
> > As for FreeBSD syscalls, I don't see a need for a new one.
> > I'll leave that up to others.
> > pathconf(2) could be used to determine if cloning is supported.
> >
> > Thanks for all the comments. It looks like a new "kernel only" flag for
> > VOP_COPY_FILE_RANGE() and a new name for VOP_PATHCONF()
> > should be all that is needed.
> So, this seems almost too easy?
>
> What I am thinking of (and should be easy to do in the next few days
> for 15.0) is:
> - Define a new pathconf variable _PC_CLONE_BLKSIZE which returns
>   the blksize for cloning or 0 if cloning is not supported.
> - Define a new flag for copy_file_range() called COPY_FILE_RANGE_CLONE
>   which, if set, would require that the entire copy be completed via
> cloning
>   (no partial copy allowed) or return ENOSYS if the file system does not
>   support this.
>   Expose this flag to userland in case any application really needs
> cloning.
> The code changes outside of NFS are trivial.
>
> So, how does this sound? ric


Yes, I think that would work.