From nobody Tue Nov 11 16:03:26 2025 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4d5WYX4dQDz6Gf6c for ; Tue, 11 Nov 2025 16:04:00 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4d5WYX17ggz3WK7; Tue, 11 Nov 2025 16:04:00 +0000 (UTC) (envelope-from kostikbel@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: from tom.home (kib@localhost [127.0.0.1] (may be forged)) by kib.kiev.ua (8.18.1/8.18.1) with ESMTP id 5ABG3Rc0023450; Tue, 11 Nov 2025 18:03:30 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua 5ABG3Rc0023450 Received: (from kostik@localhost) by tom.home (8.18.1/8.18.1/Submit) id 5ABG3QOj023449; Tue, 11 Nov 2025 18:03:26 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 11 Nov 2025 18:03:26 +0200 From: Konstantin Belousov To: Mark Johnston Cc: Rick Macklem , Don Lewis , Ronald Klop , "Peter 'PMc' Much" , FreeBSD CURRENT Subject: Re: RFC: Should copy_file_range(2) return after a few seconds? Message-ID: References: <2100145914.14642.1762672441817@localhost> List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:6939, ipnet:2001:470::/32, country:US]; TAGGED_RCPT(0.00)[] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Rspamd-Queue-Id: 4d5WYX17ggz3WK7 On Tue, Nov 11, 2025 at 09:33:52AM -0500, Mark Johnston wrote: > On Mon, Nov 10, 2025 at 01:02:54AM -0800, Rick Macklem wrote: > > On Mon, Nov 10, 2025 at 12:15 AM Don Lewis wrote: > > > > > > On 9 Nov, Rick Macklem wrote: > > > > On Sat, Nov 8, 2025 at 11:14 PM Ronald Klop wrote: > > > >> > > > >> > > > >> Van: Rick Macklem > > > >> Datum: 9 november 2025 00:23 > > > >> Aan: FreeBSD CURRENT > > > >> CC: Peter 'PMc' Much > > > >> Onderwerp: RFC: Should copy_file_range(2) return after a few seconds? > > > >> > > > >> Hi, > > > >> > > > >> Peter Much reported a problem on the freebsd-fs@ mailing > > > >> list on Oct. 21 under the Subject: "Why does rangelock_enqueue() > > > >> hang for hours?". > > > >> > > > >> The problem was that he had a copy_file_range(2) copying > > > >> between a large NFS file and a local file that was taking 2hrs. > > > >> While this copy_file_range(2) was in progress, it was holding > > > >> a rangelock for the entire output file, causing another process > > > >> trying to read the output file to hang, waiting for the rangelock. > > > >> > > > >> Since copy_file_range(2) is not any standard (just trying to > > > >> emulate the Linux one), there is no definitive answer w.r.t. > > > >> should it hold rangelocks. However, that is how it is currently > > > >> coded and I, personally, think it is appropriate to do so. > > > >> > > > >> Having a copy_file_range(2) syscall take two hours is > > > >> definitely an unusual case, but it does seem that it is > > > >> excessive? > > > >> > > > >> Peter tried a quick patch I gave him that limited the > > > >> copy_file_range(2) to 1sec and it fixed the problem > > > >> he was observing. > > > >> > > > >> Which brings me to the question... > > > >> Should copy_file_range(2) be time limited? > > > >> And, if the answer to this is "yes", how long do > > > >> you think the time limit should be? > > > >> (1sec, 2-5sec or ??) > > > >> > > > >> Note that the longer you allow copy_file_range(2) > > > >> to continue, the more efficient it will be. > > > >> > > > >> Thanks in advance for any comments, rick > > > >> > > > >> ________________________________ > > > >> > > > >> > > > >> > > > >> Why is this locking needed? > > > >> AFAIK Unix has advisory locking, so if you read a file somebody else is writing the result is your own problem. It is up to the applications to adhere to the locking. > > > >> Is this a lock different than file locking from user space? > > > > Yes. A rangelock is used for a byte range during a read(2) or > > > > write(2) to ensure that they are serialized. This is a POSIX > > > > requirement. (See this post by kib@ in the original email > > > > discussion. https://lists.freebsd.org/archives/freebsd-fs/2025-October/004704.html) > > > > > > > > Since there is no POSIX standard for copy_file_range(), it could > > > > be argued that range locking isn't required for copy_file_range(), > > > > but that makes it inconsistent with read(2)/write(2) behaviour. > > > > (I, personally, am more comfortable with a return after N sec > > > > than removing the range locking, but that's just my opinion.) > > > > > > > > rick > > > > > > > >> Why can’t this tail a file that is being written by copy_file_range if none of the applications request a lock? > > > > > > Since writes don't go backwards, it would seem to make sense to advance > > > the start of the range lock as the copy proceeds. > > The current code does the rangelock above the VOP layer and, > > for ZFS, if block cloning is enabled, the entire copy happens > > all at once and fairly quickly (it's copy on write as I understand it). > > I think the rangelock holder can detect that other threads are sleeping, > blocked on the lock. In this case, perhaps filesystems should > periodically check for contention, and if present could return to the > syscall layer to release the lock and give other threads a chance to > proceed? And what is the use of rangelocks then? The proposed change would break the atomicity of reads vs writes. > > > I can't recall for certain, but I think the rangelock must be acquired > > before the vnode lock(s), so I don't think moving it to below the > > VOP layer is practical? > > > > rick > > > > > As long as the read > > > position + length is before the write position, there is no reason to > > > block the read. Running "cat outfile" would look a lot like tail -f > > > because cat would only see the new data because it would temporarily > > > block if it ever caught up with the copy. > > > > > > tail is a bit funky, though. If the size of the destination file is > > > updated periodically during the copy, tail could return early with an > > > earlier part of the file. If the size is updated immediately to the > > > final size, then tail will wait for the copy to complete, but will > > > output the true end of the file. > > > > > > What about backups? > >