Re: RFC: Should copy_file_range(2) return after a few seconds?

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Tue, 11 Nov 2025 16:03:26 UTC
On Tue, Nov 11, 2025 at 09:33:52AM -0500, Mark Johnston wrote:
> On Mon, Nov 10, 2025 at 01:02:54AM -0800, Rick Macklem wrote:
> > On Mon, Nov 10, 2025 at 12:15 AM Don Lewis <truckman@freebsd.org> wrote:
> > >
> > > On  9 Nov, Rick Macklem wrote:
> > > > On Sat, Nov 8, 2025 at 11:14 PM Ronald Klop <ronald-lists@klop.ws> wrote:
> > > >>
> > > >>
> > > >> Van: Rick Macklem <rick.macklem@gmail.com>
> > > >> Datum: 9 november 2025 00:23
> > > >> Aan: FreeBSD CURRENT <freebsd-current@freebsd.org>
> > > >> CC: Peter 'PMc' Much <pmc@citylink.dinoex.sub.org>
> > > >> Onderwerp: RFC: Should copy_file_range(2) return after a few seconds?
> > > >>
> > > >> Hi,
> > > >>
> > > >> Peter Much reported a problem on the freebsd-fs@ mailing
> > > >> list on Oct. 21 under the Subject: "Why does rangelock_enqueue()
> > > >> hang for hours?".
> > > >>
> > > >> The problem was that he had a copy_file_range(2) copying
> > > >> between a large NFS file and a local file that was taking 2hrs.
> > > >> While this copy_file_range(2) was in progress, it was holding
> > > >> a rangelock for the entire output file, causing another process
> > > >> trying to read the output file to hang, waiting for the rangelock.
> > > >>
> > > >> Since copy_file_range(2) is not any standard (just trying to
> > > >> emulate the Linux one), there is no definitive answer w.r.t.
> > > >> should it hold rangelocks.  However, that is how it is currently
> > > >> coded and I, personally, think it is appropriate to do so.
> > > >>
> > > >> Having a copy_file_range(2) syscall take two hours is
> > > >> definitely an unusual case, but it does seem that it is
> > > >> excessive?
> > > >>
> > > >> Peter tried a quick patch I gave him that limited the
> > > >> copy_file_range(2) to 1sec and it fixed the problem
> > > >> he was observing.
> > > >>
> > > >> Which brings me to the question...
> > > >> Should copy_file_range(2) be time limited?
> > > >> And, if the answer to this is "yes", how long do
> > > >> you think the time limit should be?
> > > >> (1sec, 2-5sec or ??)
> > > >>
> > > >> Note that the longer you allow copy_file_range(2)
> > > >> to continue, the more efficient it will be.
> > > >>
> > > >> Thanks in advance for any comments, rick
> > > >>
> > > >> ________________________________
> > > >>
> > > >>
> > > >>
> > > >> Why is this locking needed?
> > > >> AFAIK Unix has advisory locking, so if you read a file somebody else is writing the result is your own problem. It is up to the applications to adhere to the locking.
> > > >> Is this a lock different than file locking from user space?
> > > > Yes. A rangelock is used for a byte range during a read(2) or
> > > > write(2) to ensure that they are serialized.  This is a POSIX
> > > > requirement. (See this post by kib@ in the original email
> > > > discussion. https://lists.freebsd.org/archives/freebsd-fs/2025-October/004704.html)
> > > >
> > > > Since there is no POSIX standard for copy_file_range(), it could
> > > > be argued that range locking isn't required for copy_file_range(),
> > > > but that makes it inconsistent with read(2)/write(2) behaviour.
> > > > (I, personally, am more comfortable with a return after N sec
> > > > than removing the range locking, but that's just my opinion.)
> > > >
> > > > rick
> > > >
> > > >> Why can’t this tail a file that is being written by copy_file_range if none of the applications request a lock?
> > >
> > > Since writes don't go backwards, it would seem to make sense to advance
> > > the start of the range lock as the copy proceeds.
> > The current code does the rangelock above the VOP layer and,
> > for ZFS, if block cloning is enabled, the entire copy happens
> > all at once and fairly quickly (it's copy on write as I understand it).
> 
> I think the rangelock holder can detect that other threads are sleeping,
> blocked on the lock.  In this case, perhaps filesystems should
> periodically check for contention, and if present could return to the
> syscall layer to release the lock and give other threads a chance to
> proceed?
And what is the use of rangelocks then?  The proposed change would
break the atomicity of reads vs writes.

> 
> > I can't recall for certain, but I think the rangelock must be acquired
> > before the vnode lock(s), so I don't think moving it to below the
> > VOP layer is practical?
> > 
> > rick
> > 
> > >  As long as the read
> > > position + length is before the write position, there is no reason to
> > > block the read.  Running "cat outfile" would look a lot like tail -f
> > > because cat would only see the new data because it would temporarily
> > > block if it ever caught up with the copy.
> > >
> > > tail is a bit funky, though.  If the size of the destination file is
> > > updated periodically during the copy, tail could return early with an
> > > earlier part of the file.  If the size is updated immediately to the
> > > final size, then tail will wait for the copy to complete, but will
> > > output the true end of the file.
> > >
> > > What about backups?
> >