Re: Why does rangelock_enqueue() hang for hours?

From: Rick Macklem <rick.macklem_at_gmail.com>
Date: Thu, 23 Oct 2025 14:21:56 UTC
On Thu, Oct 23, 2025 at 2:54 AM Peter 'PMc' Much
<pmc@citylink.dinoex.sub.org> wrote:
>
> On Wed, Oct 22, 2025 at 08:52:00AM -0700, Rick Macklem wrote:
> ! On Tue, Oct 21, 2025 at 7:50 AM Bakul Shah <bakul@iitbombay.org> wrote:
> ! >
> ! > I didn't read this thread before commenting on the forum where Peter
> ! > first raised this issue. Adding the relevant part of my comment here:
> ! > +---
> ! > By git blame cat.c we find it was added on 2023-07-08 in commit 8113cc8276.
> ! > git log 8113cc8276 says
> ! >   cat: use copy_file_range(2) with fallback to previous behavior
> ! >
> ! >   This allows to use special filesystem features like server-side
> ! >   copying on NFS 4.2 or block cloning on OpenZFS 2.2.
> ! >
> ! > May be it should check that these conditions are met? That is, both files should be
> ! > remote or both files should be local for it to be really worth it. In any case IMHO
> ! > this should not be the default behavior. Still, it should not hang....
> ! Peter, you could try the attached trivial patch (untested).
> !
> ! I'm not sure if this is a reasonable thing to do, but at least you can report
> ! back to let us know if it fixes your problem?
>
>
> Hi Rick,
>
>   I tested the patch. And I did somehting more, like
> trying to update my linux installation (which was unpleasant
> and didn't fully succeed) and have a look there. See below.
>
> The patch helps. Things on the writing side now look like this:
>
> ...
> 1.409706711 copy_file_range(0x3,0x0,0x4,0x0,0x7fffffffffffffff,0x0) = 393216 (0x60000)
> 1.216986006 copy_file_range(0x3,0x0,0x4,0x0,0x7fffffffffffffff,0x0) = 393216 (0x60000)
> 1.219576946 copy_file_range(0x3,0x0,0x4,0x0,0x7fffffffffffffff,0x0) = 393216 (0x60000)
> 1.025836739 copy_file_range(0x3,0x0,0x4,0x0,0x7fffffffffffffff,0x0) = 262144 (0x40000)
> ...
>
> More interesting, the read access runs immediately, it does not wait
> for that one second to find a gap.
I suspect that was because you did it after the first copy_file_range() call.
The 2nd and subsequent calls would not start at offset 0, so the rangelock
would not start at offset 0 either.

>
> But, I am still wondering: why do we do this? And then I found,
> Linux (6.12.38+kali-amd64) does not do it:
>
>
> $ strace cp XX XY
> ...
> copy_file_range(3, NULL, 4, NULL, 9223372035781033984, 0^Z
> [1]+ Stopped                  strace cp XX XY
> $ cp XY XZ
> $
>
> This does not block. And it does not split the copy_file_range()
> into chunks. FreeBSD 14.3 does block at this point.
As you probably already know, there is no standard for copy_file_range(2).
When I did it, the aim was to be Linux compatible, but I guess it is
no surprise that it isn't 100% compatible. (The Linux copy_file_range(2)
is a moving target. It started out as a libc function and then its semantics
changed significantly at some Linux version. I've forgotten which version.
Prior to that version, a copy_file_range() with a len argument that went
past EOF was not allowed, if I recall correctly?)

Range locking is required for read/write (I'm fairly sure that is in the POSIX
standard for them). When I did copy_file_range(2) for FreeBSD others
(I don't recall who) thought that it should do range locking to be consistent
with read/write, which made sense to me.

I will ask on freebsd-current@ (few read freebsd-fs@) to see what the
consensus is w.r.t. this. (I suspect the "return after 1sec" is preferred
over disabling range locking, but we'll see.) I will also run some tests
on the Linux system I have, to confirm what their semantics are for
a recent Linux kernel. (Don't expect to see the post for a little while.)

rick

>
> BTW: this is another one of my creepy use-cases: freeze some
> job and forget about it - and if it happens to use cp somewhere,
> then all other reads traversing the concerned file (e.g. backup)
> would also freeze. And then after a week we wonder why we do not
> have backups.
>
> rgds,
> PMc
>
>
> ! >
> ! > > On Oct 21, 2025, at 6:28 AM, Rick Macklem <rick.macklem@gmail.com> wrote:
> ! > >
> ! > > On Tue, Oct 21, 2025 at 6:09 AM Peter 'PMc' Much
> ! > > <pmc@citylink.dinoex.sub.org> wrote:
> ! > >>
> ! > >>
> ! > >> This is 14.3-RELEASE.
> ! > >>
> ! > >> I am copying a file from a NFSv4 share to a local filesystem. This
> ! > >> takes a couple of hours.
> ! > >>
> ! > >> In the meantime I want to read that partially copied file. This is
> ! > >> not possible. The reading process locks up in rangelock_enqueue(),
> ! > >> unkillable(!), and only after the first slow copy has completed, it
> ! > >> will do it's job.
> ! > >>
> ! > >> Even if I do the first copy to stdout with redirect to file, the
> ! > >> same problem happens. I.e.:
> ! > >>
> ! > >> $ cat /nfsshare/File > /localfs/File &
> ! > >> $ cat /localfs/File  --> HANGS unkillable
> ! > > This is caused by "cat" using copy_file_range(2), where the
> ! > > system call is taking a long time.
> ! > >
> ! > > The version done below makes "cat" not use copy_file_range(2).
> ! > > (copy_file_range(2) is interruptible, but that stops the file copy.
> ! > > It also has a "return after 1sec" option.
> ! > > Maybe that option should be exposed to userland and used by
> ! > > "cat", "cp" and friends at least when enabled by a command line
> ! > > option. (I'll admit looking at a file while it is being copied is a bit odd?)
> ! > > The whole idea behind range-lock is to prevent a read/write syscall
> ! > > from seeing a partial write. It just happens that the "write" takes a long
> ! > > time in this case.
> ! > >
> ! > > Do others have thoughts on this? rick
> ! > >
> ! > >>
> ! > >> Only if I introduce another process, the tie is avoided:
> ! > >>
> ! > >> $ cat /nfsshare/File | cat > /localfs/File &
> ! > >> $ cat /localfs/File  --> WORKS
> ! > >>
> ! > >> I very much doubt that this is how it should be.
> ! > >>
> ! > >> Also, if I try to get some information about the supposed operation
> ! > >> of this "rangelock" feature, search engines point me to a
> ! > >> "rangelock(9)" manpage on man.freebsd.org, but that page doesn't
> ! > >> seem to exist. :(
> ! > >>
> ! > >
> ! >
>
>