Re: RFC: Should copy_file_range(2) return after a few seconds?
Date: Sun, 09 Nov 2025 00:42:35 UTC
On Sat, Nov 08, 2025 at 03:22:37PM -0800, Rick Macklem wrote: > Hi, > > Peter Much reported a problem on the freebsd-fs@ mailing > list on Oct. 21 under the Subject: "Why does rangelock_enqueue() > hang for hours?". > > The problem was that he had a copy_file_range(2) copying > between a large NFS file and a local file that was taking 2hrs. > While this copy_file_range(2) was in progress, it was holding > a rangelock for the entire output file, causing another process > trying to read the output file to hang, waiting for the rangelock. > > Since copy_file_range(2) is not any standard (just trying to > emulate the Linux one), there is no definitive answer w.r.t. > should it hold rangelocks. However, that is how it is currently > coded and I, personally, think it is appropriate to do so. > > Having a copy_file_range(2) syscall take two hours is > definitely an unusual case, but it does seem that it is > excessive? > > Peter tried a quick patch I gave him that limited the > copy_file_range(2) to 1sec and it fixed the problem > he was observing. > > Which brings me to the question... > Should copy_file_range(2) be time limited? > And, if the answer to this is "yes", how long do > you think the time limit should be? > (1sec, 2-5sec or ??) > > Note that the longer you allow copy_file_range(2) > to continue, the more efficient it will be. > > Thanks in advance for any comments, rick For me, making a syscall limited by runtime is very strange idea. IMO it should not be done. What can be done, I think, is to add signal interruption points into the copying loop. AFAIR we request a chunk to be copied, for some size of the chunk. After the copy is done, kernel could use sig_intr() to check for either interruption or suspend conditions. If non-zero is returned, you might finish the loop earlier, reporting the partial copy.