Re: Why does rangelock_enqueue() hang for hours?

From: Peter 'PMc' Much <pmc_at_citylink.dinoex.sub.org>
Date: Tue, 21 Oct 2025 14:29:00 UTC
On Tue, Oct 21, 2025 at 06:28:20AM -0700, Rick Macklem wrote:
! On Tue, Oct 21, 2025 at 6:09 AM Peter 'PMc' Much
! <pmc@citylink.dinoex.sub.org> wrote:
! >
! >
! > This is 14.3-RELEASE.
! >
! > I am copying a file from a NFSv4 share to a local filesystem. This
! > takes a couple of hours.
! >
! > In the meantime I want to read that partially copied file. This is
! > not possible. The reading process locks up in rangelock_enqueue(),
! > unkillable(!), and only after the first slow copy has completed, it
! > will do it's job.
! >
! > Even if I do the first copy to stdout with redirect to file, the
! > same problem happens. I.e.:
! >
! >  $ cat /nfsshare/File > /localfs/File &
! >  $ cat /localfs/File  --> HANGS unkillable
! This is caused by "cat" using copy_file_range(2), where the
! system call is taking a long time.

Hi Rick,
  I'm currently reading the source and start to figure out that. :)

So this copy_file_range(2) tries to keep the data inside the kernel
as much as possible, and that would explain why two independent
processes lock up against each other, where (traditionally) the second
copy process shouldn't even know about what the first copy process does.

I am not really interested in a quick workaround; I can live with
the behaviour as I now know why it happens (and nobody else seems
to have complained yet) - but still I don't think it a good idea
to entirely lock out such usecases.

! Maybe that option should be exposed to userland and used by
! "cat", "cp" and friends at least when enabled by a command line
! option. (I'll admit looking at a file while it is being copied is a
! bit odd?)

I don't think so. ;) In the actual case I wanted to look into
a video while it trickled down from my backend site through my mom's
slow uplink - it was too slow to directly watch from the NFS-share,
but still faster than the actual playback speed.

! The whole idea behind range-lock is to prevent a read/write syscall
! from seeing a partial write.

Alright, traditionally ;) we knew that a file that is read while being
written is inconsistent, and traditionally ;) we were reminded to use
some explicit locking in those cases where we wanted to avoid such
situations.
But nowadays protection is no longer an opt-in. So I might suggest that
at least it should be an opt-out.

Another question I haven't yet looked into is: do we really need to
keep this lock established throughout the entire xx GB of the
filesize?

rgds, PMc