RFC: What should a copy_file_range(2) syscall do by default?

Rick Macklem rmacklem at uoguelph.ca
Sun Jun 30 00:46:05 UTC 2019

Well, asomers@ prefers the current patch, despite its complexity.
sef@ expressed concerns w.r.t. the complexity and it being done in
the kernel. However he did not state a preference for any other specific
variant, such as the ones I suggested.

As such, I am sticking with the current patch unless I hear otherwise from

Thanks for your comments, rick
ps: I felt a top post was reasonable here, as it summarized separate posts
      in the thread.

From: Alan Somers <asomers at freebsd.org>
Sent: Saturday, June 22, 2019 1:28:03 PM
To: Rick Macklem
Cc: freebsd-fs at freebsd.org; Sean Fagan
Subject: Re: RFC: What should a copy_file_range(2) syscall do by default?

On Sat, Jun 22, 2019 at 10:02 AM Rick Macklem <rmacklem at uoguelph.ca> wrote:
> Hi,
> sef@ made this comment on phabricator. I don't believe phabricator is the correct
> place for "big picture" discussions, so I'm posting it here (I'm assuming sef@ doesn't
> mind, since the phabricator comments are public).
> sef@ wrote:
> >This much work in the kernel for what //should// be user-space makes me twitchy... >but there is lots of precedent for it, so I obviously have to get with the times.
> >
> >  I've done a quick review of the code; it seems most of the complexity is in the hole->detection.  I'm also annoyed that linux used size_t for the amount to copy, when >off_t would have been more appropriate.  But not much to do about that now.
> >
> >  Having a default implementation means that user-space can't fall back if it's not >supported, and do it better (e.g., parallel I/O).  Should we also have a pathconf for >the feature?
> >
> >  WRT your question on -fs, I have no objections to this working cross-filesystem, >although I think I might ask to have a flag to fail in that case.
> Well, all I am interested in is a system call/VOP call so the NFSv4.2 client can do
> a file copy locally on the NFS server instead of doing Reads/Writes across the wire.
> The current code has gotten fairly complex, so I'll try and ask "how complex" this
> syscall/VOP call should be?
> The range of variants I can think of are:
> 0) - Don't do it at all.
> 1) - The syscall could just do a VOP_COPY_FILE_RANGE() and return whatever error
>         it returns.
>         --> This implies an error return for all file systems for now, with support for
>               NFSv4.2mounts being added later (FreeBSD13 hopefully).

This option would require applications or the C library to fallback to
a copy loop.  While doable, nothing in userland would be able to
range-lock the file, making the copy loop non-atomic.  So the
in-kernel copy is superior.

> 2) - The syscall could fall back on a simple copy loop, but not try to deal with holes.
>        --> The Linux man page mentions using copy_file_range(2) in a loop with
>              lseek(SEEK_DATA)/lseek(SEEK_HOLE) for sparse files. This suggests that
>              the Linux fallback code doesn't try to handle holes.

Same problem as 1.  Or if you do the copy loop in-kernel it would
waste CPU time and expand sparse files, which isn't good either.

> 3) - The current patch which tries to handle holes and copy the entire byte range
>        in one call.

Definitely the best option, despite its complexity.  I would argue
that the complexity calls for a robust test suite, rather than
abandoning the feature.

More information about the freebsd-fs mailing list