Re: Why does rangelock_enqueue() hang for hours?

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Thu, 23 Oct 2025 14:49:27 UTC
On Thu, Oct 23, 2025 at 07:21:56AM -0700, Rick Macklem wrote:
> On Thu, Oct 23, 2025 at 2:54 AM Peter 'PMc' Much
> <pmc@citylink.dinoex.sub.org> wrote:
> >
> > On Wed, Oct 22, 2025 at 08:52:00AM -0700, Rick Macklem wrote:
> > ! On Tue, Oct 21, 2025 at 7:50 AM Bakul Shah <bakul@iitbombay.org> wrote:
> > ! >
> > ! > I didn't read this thread before commenting on the forum where Peter
> > ! > first raised this issue. Adding the relevant part of my comment here:
> > ! > +---
> > ! > By git blame cat.c we find it was added on 2023-07-08 in commit 8113cc8276.
> > ! > git log 8113cc8276 says
> > ! >   cat: use copy_file_range(2) with fallback to previous behavior
> > ! >
> > ! >   This allows to use special filesystem features like server-side
> > ! >   copying on NFS 4.2 or block cloning on OpenZFS 2.2.
> > ! >
> > ! > May be it should check that these conditions are met? That is, both files should be
> > ! > remote or both files should be local for it to be really worth it. In any case IMHO
> > ! > this should not be the default behavior. Still, it should not hang....
> > ! Peter, you could try the attached trivial patch (untested).
> > !
> > ! I'm not sure if this is a reasonable thing to do, but at least you can report
> > ! back to let us know if it fixes your problem?
> >
> >
> > Hi Rick,
> >
> >   I tested the patch. And I did somehting more, like
> > trying to update my linux installation (which was unpleasant
> > and didn't fully succeed) and have a look there. See below.
> >
> > The patch helps. Things on the writing side now look like this:
> >
> > ...
> > 1.409706711 copy_file_range(0x3,0x0,0x4,0x0,0x7fffffffffffffff,0x0) = 393216 (0x60000)
> > 1.216986006 copy_file_range(0x3,0x0,0x4,0x0,0x7fffffffffffffff,0x0) = 393216 (0x60000)
> > 1.219576946 copy_file_range(0x3,0x0,0x4,0x0,0x7fffffffffffffff,0x0) = 393216 (0x60000)
> > 1.025836739 copy_file_range(0x3,0x0,0x4,0x0,0x7fffffffffffffff,0x0) = 262144 (0x40000)
> > ...
> >
> > More interesting, the read access runs immediately, it does not wait
> > for that one second to find a gap.
> I suspect that was because you did it after the first copy_file_range() call.
> The 2nd and subsequent calls would not start at offset 0, so the rangelock
> would not start at offset 0 either.
> 
> >
> > But, I am still wondering: why do we do this? And then I found,
> > Linux (6.12.38+kali-amd64) does not do it:
> >
> >
> > $ strace cp XX XY
> > ...
> > copy_file_range(3, NULL, 4, NULL, 9223372035781033984, 0^Z
> > [1]+ Stopped                  strace cp XX XY
> > $ cp XY XZ
> > $
> >
> > This does not block. And it does not split the copy_file_range()
> > into chunks. FreeBSD 14.3 does block at this point.
> As you probably already know, there is no standard for copy_file_range(2).
> When I did it, the aim was to be Linux compatible, but I guess it is
> no surprise that it isn't 100% compatible. (The Linux copy_file_range(2)
> is a moving target. It started out as a libc function and then its semantics
> changed significantly at some Linux version. I've forgotten which version.
> Prior to that version, a copy_file_range() with a len argument that went
> past EOF was not allowed, if I recall correctly?)
> 
> Range locking is required for read/write (I'm fairly sure that is in the POSIX
> standard for them). When I did copy_file_range(2) for FreeBSD others
> (I don't recall who) thought that it should do range locking to be consistent
> with read/write, which made sense to me.

See https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html
2.9.7 Thread Interactions with Regular File Operations

[List of functions about file io, including read() and write)]

If two threads each call one of these functions, each call shall either
see all of the specified effects of the other call, or none of them.

> 
> I will ask on freebsd-current@ (few read freebsd-fs@) to see what the
> consensus is w.r.t. this. (I suspect the "return after 1sec" is preferred
> over disabling range locking, but we'll see.) I will also run some tests
> on the Linux system I have, to confirm what their semantics are for
> a recent Linux kernel. (Don't expect to see the post for a little while.)
> 
> rick
> 
> >
> > BTW: this is another one of my creepy use-cases: freeze some
> > job and forget about it - and if it happens to use cp somewhere,
> > then all other reads traversing the concerned file (e.g. backup)
> > would also freeze. And then after a week we wonder why we do not
> > have backups.
> >
> > rgds,
> > PMc
> >
> >
> > ! >
> > ! > > On Oct 21, 2025, at 6:28 AM, Rick Macklem <rick.macklem@gmail.com> wrote:
> > ! > >
> > ! > > On Tue, Oct 21, 2025 at 6:09 AM Peter 'PMc' Much
> > ! > > <pmc@citylink.dinoex.sub.org> wrote:
> > ! > >>
> > ! > >>
> > ! > >> This is 14.3-RELEASE.
> > ! > >>
> > ! > >> I am copying a file from a NFSv4 share to a local filesystem. This
> > ! > >> takes a couple of hours.
> > ! > >>
> > ! > >> In the meantime I want to read that partially copied file. This is
> > ! > >> not possible. The reading process locks up in rangelock_enqueue(),
> > ! > >> unkillable(!), and only after the first slow copy has completed, it
> > ! > >> will do it's job.
> > ! > >>
> > ! > >> Even if I do the first copy to stdout with redirect to file, the
> > ! > >> same problem happens. I.e.:
> > ! > >>
> > ! > >> $ cat /nfsshare/File > /localfs/File &
> > ! > >> $ cat /localfs/File  --> HANGS unkillable
> > ! > > This is caused by "cat" using copy_file_range(2), where the
> > ! > > system call is taking a long time.
> > ! > >
> > ! > > The version done below makes "cat" not use copy_file_range(2).
> > ! > > (copy_file_range(2) is interruptible, but that stops the file copy.
> > ! > > It also has a "return after 1sec" option.
> > ! > > Maybe that option should be exposed to userland and used by
> > ! > > "cat", "cp" and friends at least when enabled by a command line
> > ! > > option. (I'll admit looking at a file while it is being copied is a bit odd?)
> > ! > > The whole idea behind range-lock is to prevent a read/write syscall
> > ! > > from seeing a partial write. It just happens that the "write" takes a long
> > ! > > time in this case.
> > ! > >
> > ! > > Do others have thoughts on this? rick
> > ! > >
> > ! > >>
> > ! > >> Only if I introduce another process, the tie is avoided:
> > ! > >>
> > ! > >> $ cat /nfsshare/File | cat > /localfs/File &
> > ! > >> $ cat /localfs/File  --> WORKS
> > ! > >>
> > ! > >> I very much doubt that this is how it should be.
> > ! > >>
> > ! > >> Also, if I try to get some information about the supposed operation
> > ! > >> of this "rangelock" feature, search engines point me to a
> > ! > >> "rangelock(9)" manpage on man.freebsd.org, but that page doesn't
> > ! > >> seem to exist. :(
> > ! > >>
> > ! > >
> > ! >
> >
> >
>