Re: RFC: Should copy_file_range(2) return after a few seconds?

From: Konstantin Belousov <kib_at_freebsd.org>
Date: Sun, 09 Nov 2025 03:40:25 UTC
On Sat, Nov 08, 2025 at 06:09:15PM -0800, Rick Macklem wrote:
> On Sat, Nov 8, 2025 at 4:43 PM Konstantin Belousov <kib@freebsd.org> wrote:
> >
> > On Sat, Nov 08, 2025 at 03:22:37PM -0800, Rick Macklem wrote:
> > > Hi,
> > >
> > > Peter Much reported a problem on the freebsd-fs@ mailing
> > > list on Oct. 21 under the Subject: "Why does rangelock_enqueue()
> > > hang for hours?".
> > >
> > > The problem was that he had a copy_file_range(2) copying
> > > between a large NFS file and a local file that was taking 2hrs.
> > > While this copy_file_range(2) was in progress, it was holding
> > > a rangelock for the entire output file, causing another process
> > > trying to read the output file to hang, waiting for the rangelock.
> > >
> > > Since copy_file_range(2) is not any standard (just trying to
> > > emulate the Linux one), there is no definitive answer w.r.t.
> > > should it hold rangelocks.  However, that is how it is currently
> > > coded and I, personally, think it is appropriate to do so.
> > >
> > > Having a copy_file_range(2) syscall take two hours is
> > > definitely an unusual case, but it does seem that it is
> > > excessive?
> > >
> > > Peter tried a quick patch I gave him that limited the
> > > copy_file_range(2) to 1sec and it fixed the problem
> > > he was observing.
> > >
> > > Which brings me to the question...
> > > Should copy_file_range(2) be time limited?
> > > And, if the answer to this is "yes", how long do
> > > you think the time limit should be?
> > > (1sec, 2-5sec or ??)
> > >
> > > Note that the longer you allow copy_file_range(2)
> > > to continue, the more efficient it will be.
> > >
> > > Thanks in advance for any comments, rick
> >
> > For me, making a syscall limited by runtime is very strange idea.
> > IMO it should not be done.
> >
> > What can be done, I think, is to add signal interruption points into
> > the copying loop.  AFAIR we request a chunk to be copied, for some
> > size of the chunk.  After the copy is done, kernel could use
> > sig_intr() to check for either interruption or suspend conditions.
> > If non-zero is returned, you might finish the loop earlier, reporting
> > the partial copy.
> >
> It already interrupts when a signal like <ctrl>C is posted.
> 
> The reporter didn't want the copy (he was actually using "cat")
> to terminate. He wanted to read the output file when it was
> only partially copied, which doesn't work because of the
> rangelock. (It appears it does work on Linux.)
Apps can install timer which would generate SIGALRM after 1 or 2 secs.
Then the syscall is interrupted and partial copy is reported.

> 
> A partial copy_file_range() is normal and an NFSv4.2 server
> will return a partial copy after 1sec or so since there is a
> general understanding (not wired down in any RFC) that
> an RPC should always reply in 1-2sec.
> 
> rick