Re: Why does rangelock_enqueue() hang for hours?

From: Rick Macklem <rick.macklem_at_gmail.com>
Date: Fri, 24 Oct 2025 02:03:51 UTC
On Thu, Oct 23, 2025 at 11:29 AM Bakul Shah <bakul@iitbombay.org> wrote:
>
> On Oct 22, 2025, at 8:52 AM, Rick Macklem <rick.macklem@gmail.com> wrote:
> >
> > Peter, you could try the attached trivial patch (untested).
> >
> > I'm not sure if this is a reasonable thing to do, but at least you can report
> > back to let us know if it fixes your problem?
>
> One thing I had suggested was to use multiple copy_file_range() calls,
> say 4MB each in cp or cat as neither provide any atomic guarantees --
> normally they would repeat read && write until done. At the very least
> this kicks the can down the road!
The problem with this plan is that it limits the possible optimization
that copy_file_range(2) provides.

For example:
For ZFS the default is now to have block cloning (copy on write basically,
as I understand it) enabled.
If I understand it correctly, this means that a copy of a very large file
will still happen quickly, for both local copying within a zpool and remote
copying within a zpool via NFSv4.2.
--> Setting the copy size to 4Mbytes (or anything less than EOF on the
     input file), makes this much less efficient.

The problem case is when the file system does not provide an
efficient VOP_COPY_FILE_RANGE() and the code falls back to
vn_generic_copy_file_range(), which copies via a read/write loop.

The patch I posted limits the vn_generic_copy_file_range() so
that it does a read/write loop for only 1sec before returning, which
handles this case.

The other problematic case is, when a Copy is done over NFSv4.2,
there is no way of knowing how long it will take.
There is a principal in NFS that an RPC should not take more than
1-2sec, but that is not specified in the RFCs, so it cannot be
guaranteed that a NFSv4.2 server will return a Copy reply in 1-2sec,
although it should normally be the case that it does.

As I said, I'll ask on freebsd-current@ to try and get opinions w.r.t.
what/if anything should be done about this.

rick

>
> This way reads would be blocked only when they catch up with writes.
>
>