From nobody Tue Nov 11 14:33:52 2025 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4d5TYg2Q3rz6GWpX for ; Tue, 11 Nov 2025 14:33:59 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-qk1-x733.google.com (mail-qk1-x733.google.com [IPv6:2607:f8b0:4864:20::733]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4d5TYf542Bz3DTr for ; Tue, 11 Nov 2025 14:33:58 +0000 (UTC) (envelope-from markjdb@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20230601 header.b=MC25O5Wh; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=freebsd.org (policy=none); spf=pass (mx1.freebsd.org: domain of markjdb@gmail.com designates 2607:f8b0:4864:20::733 as permitted sender) smtp.mailfrom=markjdb@gmail.com Received: by mail-qk1-x733.google.com with SMTP id af79cd13be357-8b1e54aefc5so380687785a.1 for ; Tue, 11 Nov 2025 06:33:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1762871636; x=1763476436; darn=freebsd.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:sender :from:to:cc:subject:date:message-id:reply-to; bh=QaCy7HaeMw1N+tXYBxjxOMX1xCMpas15gI9fTUQm9uM=; b=MC25O5WhjdGQLPAJV6/p9voimd1fKGj0fp9hHWylwuoWSFlPWc/mUk+uE0TUAfomLP sugddnDQUT+3PO7oYhci0oZeLnbt5UWitl9VpHGMKHhiS6ZmFkLh4qNk8h8b2J85YWWv viX/yCCyQrZ9o5bOVSCvhrFNSetlYw1GpH74bXxIGAQY/qNTn6PL+a6N2U1NIsnZb9Hc On+XOjRm1a/r1uUnLOHa93/bHcfBMbGe0rBJrZ3TxC5u/98vpK6Q4sV/xKPIWlQ+G6BU bB0cDgu8k4XT4PNK9frJI5YvcYgptchnfqbe0eE9aGjHZC7j6m+17kwfSEVyg83dfrFg 8pMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762871636; x=1763476436; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:sender :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=QaCy7HaeMw1N+tXYBxjxOMX1xCMpas15gI9fTUQm9uM=; b=GIHYctUd0QfKcmPOE2Vpuf+F1lsmjwLluGdgXDwKxExNQgoJcTl01707ISG877c4Ei HyCIG+rkBBYx14K0QsvjdqHNIAeuBxSO1ZKLNTiNaOjFYREODhTs+Op64zC3IRylmRev Ms3JyjEaUxjkA168pUnN5dAGd65RplTipDS5vf2GrDi2AsWuDoGjXWzFEJ4BSv6Raj5j RorvBiaUP9WyDPqtM8aPCrSKW2858wUZWdHbUknmKZTHSAdZO9jE3WL3oruId/g27PXf 85sJhrNyQf0UDKnE1fsupgfM2jJv5cmdrcB0zheRKjTHYOIVQRU16kH0BxSe4odi676F kYgQ== X-Forwarded-Encrypted: i=1; AJvYcCV88BTsNlZlJxZ8ejldfkgl27QQ/lMAcr14z2aZhe7XSLiBtC+EbwyQRJTUqw0tO2dprHCoE9aFvXGmz4tGQFo=@freebsd.org X-Gm-Message-State: AOJu0YzyE1rslG2tU+ZFTbZ6P8kO+x5npaLE/b6QpB2uZfRvc/kiwCMc BT6YspcztSOF1A06F1Y/4qzvi8ide5CTd60kGPVLSDlCVArkSeQIX+5V X-Gm-Gg: ASbGncsn8jQfyZu5+TnKbmOgfLs3xojiqNSrThl+AbHQpiegxzzb+l8sbrdnGcCri5n 2Z8BIyrqXGFuyBGMCOOLRLPJooMzEiiiBo5qvZF1A9Tm9jwc/z0DFA0yULwvbfR+sg+ftJzw+l8 ou5TtrQV/ZUgeFUfw0ylx/XeYyqQ09QJZI5Id8PNv6KxwxftQvYV53YmeqNF4I69iGAhj5w9lQ/ dACuCWDlYA9daLfR8+viimBOyAxIE856+ne6Jgl9KINuWIjB685YPF9zqWVIx48Dz+PGSFh1NIc SL08MTMUVQejz7SOD/6Ac2ovyNYhfD8AgfbQiHE5g8I6iLJlMExVuMTs08JAuCyCJqJtrSVkHXu 7foep5PnMWdWlreE5Yhpdyj3BmZX9O32IKVElOmv3K0RtaNR4s9A33idKoPoC+YDrYk9Ujow2xQ dUQNpWhkM= X-Google-Smtp-Source: AGHT+IFCltpNM28KtqKqb9b22VHCv+bouZUAsjn5GBD0SHaBcMDSGadynbamm/fWu73cfBFrwLYIuw== X-Received: by 2002:a05:620a:4891:b0:8a1:21a6:e04f with SMTP id af79cd13be357-8b257f0b56bmr1598902185a.28.1762871636062; Tue, 11 Nov 2025 06:33:56 -0800 (PST) Received: from nuc (192-0-220-237.cpe.teksavvy.com. [192.0.220.237]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8b25e8b409csm704908085a.22.2025.11.11.06.33.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Nov 2025 06:33:55 -0800 (PST) Date: Tue, 11 Nov 2025 09:33:52 -0500 From: Mark Johnston To: Rick Macklem Cc: Don Lewis , Ronald Klop , Peter 'PMc' Much , FreeBSD CURRENT Subject: Re: RFC: Should copy_file_range(2) return after a few seconds? Message-ID: References: <2100145914.14642.1762672441817@localhost> List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spamd-Bar: / X-Spamd-Result: default: False [-0.10 / 15.00]; SUSPICIOUS_RECIPS(1.50)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; SUBJECT_ENDS_QUESTION(1.00)[]; NEURAL_HAM_SHORT(-1.00)[-1.000]; MID_RHS_NOT_FQDN(0.50)[]; FORGED_SENDER(0.30)[markj@freebsd.org,markjdb@gmail.com]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20230601]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36:c]; MIME_GOOD(-0.10)[text/plain]; DMARC_POLICY_SOFTFAIL(0.10)[freebsd.org : SPF not aligned (relaxed), DKIM not aligned (relaxed),none]; FREEMAIL_ENVFROM(0.00)[gmail.com]; TO_DN_ALL(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::733:from]; RCVD_TLS_LAST(0.00)[]; FREEMAIL_TO(0.00)[gmail.com]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_FIVE(0.00)[5]; TO_MATCH_ENVRCPT_SOME(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-current@freebsd.org]; RCVD_COUNT_TWO(0.00)[2]; FROM_NEQ_ENVFROM(0.00)[markj@freebsd.org,markjdb@gmail.com]; DKIM_TRACE(0.00)[gmail.com:+]; MLMMJ_DEST(0.00)[freebsd-current@freebsd.org]; RCVD_VIA_SMTP_AUTH(0.00)[]; TAGGED_RCPT(0.00)[]; MISSING_XM_UA(0.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim] X-Rspamd-Queue-Id: 4d5TYf542Bz3DTr On Mon, Nov 10, 2025 at 01:02:54AM -0800, Rick Macklem wrote: > On Mon, Nov 10, 2025 at 12:15 AM Don Lewis wrote: > > > > On 9 Nov, Rick Macklem wrote: > > > On Sat, Nov 8, 2025 at 11:14 PM Ronald Klop wrote: > > >> > > >> > > >> Van: Rick Macklem > > >> Datum: 9 november 2025 00:23 > > >> Aan: FreeBSD CURRENT > > >> CC: Peter 'PMc' Much > > >> Onderwerp: RFC: Should copy_file_range(2) return after a few seconds? > > >> > > >> Hi, > > >> > > >> Peter Much reported a problem on the freebsd-fs@ mailing > > >> list on Oct. 21 under the Subject: "Why does rangelock_enqueue() > > >> hang for hours?". > > >> > > >> The problem was that he had a copy_file_range(2) copying > > >> between a large NFS file and a local file that was taking 2hrs. > > >> While this copy_file_range(2) was in progress, it was holding > > >> a rangelock for the entire output file, causing another process > > >> trying to read the output file to hang, waiting for the rangelock. > > >> > > >> Since copy_file_range(2) is not any standard (just trying to > > >> emulate the Linux one), there is no definitive answer w.r.t. > > >> should it hold rangelocks. However, that is how it is currently > > >> coded and I, personally, think it is appropriate to do so. > > >> > > >> Having a copy_file_range(2) syscall take two hours is > > >> definitely an unusual case, but it does seem that it is > > >> excessive? > > >> > > >> Peter tried a quick patch I gave him that limited the > > >> copy_file_range(2) to 1sec and it fixed the problem > > >> he was observing. > > >> > > >> Which brings me to the question... > > >> Should copy_file_range(2) be time limited? > > >> And, if the answer to this is "yes", how long do > > >> you think the time limit should be? > > >> (1sec, 2-5sec or ??) > > >> > > >> Note that the longer you allow copy_file_range(2) > > >> to continue, the more efficient it will be. > > >> > > >> Thanks in advance for any comments, rick > > >> > > >> ________________________________ > > >> > > >> > > >> > > >> Why is this locking needed? > > >> AFAIK Unix has advisory locking, so if you read a file somebody else is writing the result is your own problem. It is up to the applications to adhere to the locking. > > >> Is this a lock different than file locking from user space? > > > Yes. A rangelock is used for a byte range during a read(2) or > > > write(2) to ensure that they are serialized. This is a POSIX > > > requirement. (See this post by kib@ in the original email > > > discussion. https://lists.freebsd.org/archives/freebsd-fs/2025-October/004704.html) > > > > > > Since there is no POSIX standard for copy_file_range(), it could > > > be argued that range locking isn't required for copy_file_range(), > > > but that makes it inconsistent with read(2)/write(2) behaviour. > > > (I, personally, am more comfortable with a return after N sec > > > than removing the range locking, but that's just my opinion.) > > > > > > rick > > > > > >> Why can’t this tail a file that is being written by copy_file_range if none of the applications request a lock? > > > > Since writes don't go backwards, it would seem to make sense to advance > > the start of the range lock as the copy proceeds. > The current code does the rangelock above the VOP layer and, > for ZFS, if block cloning is enabled, the entire copy happens > all at once and fairly quickly (it's copy on write as I understand it). I think the rangelock holder can detect that other threads are sleeping, blocked on the lock. In this case, perhaps filesystems should periodically check for contention, and if present could return to the syscall layer to release the lock and give other threads a chance to proceed? > I can't recall for certain, but I think the rangelock must be acquired > before the vnode lock(s), so I don't think moving it to below the > VOP layer is practical? > > rick > > > As long as the read > > position + length is before the write position, there is no reason to > > block the read. Running "cat outfile" would look a lot like tail -f > > because cat would only see the new data because it would temporarily > > block if it ever caught up with the copy. > > > > tail is a bit funky, though. If the size of the destination file is > > updated periodically during the copy, tail could return early with an > > earlier part of the file. If the size is updated immediately to the > > final size, then tail will wait for the copy to complete, but will > > output the true end of the file. > > > > What about backups? >