From nobody Fri Aug 22 15:24:20 2025 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4c7kWW3Kctz64VF3 for ; Fri, 22 Aug 2025 15:24:39 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Received: from mail-ed1-x534.google.com (mail-ed1-x534.google.com [IPv6:2a00:1450:4864:20::534]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4c7kWV3hxlz3Rr7 for ; Fri, 22 Aug 2025 15:24:38 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20230601 header.b="K/7V1cX3"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of rick.macklem@gmail.com designates 2a00:1450:4864:20::534 as permitted sender) smtp.mailfrom=rick.macklem@gmail.com Received: by mail-ed1-x534.google.com with SMTP id 4fb4d7f45d1cf-6188b793d21so3373334a12.3 for ; Fri, 22 Aug 2025 08:24:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1755876272; x=1756481072; darn=freebsd.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=gQ+paMTdnxG6G5FVgKf9n3pDECi4PPdEI9Tq/njK+Iw=; b=K/7V1cX3apLsRcQahslBwl+k2B6bQcH0pQUi8cKcOlnuEIBnkoXzBlEJ+7FUecYw9O YD+1TstxdrLNmJg3A45PhaQnl5FRGzpCcxdGD2/BTVhDlwa9jfRwI+baN/OtU9SZ1VT1 lq848vxw7hdjauW3OQ1ogp1XensVItPJXQBXTILnXKBRHinl97vwaXVeaObZyQWbYvcm poJKmqeuSao3wh6RHaRkPmrJiiUCNJnL+tam1p46o9x84P4lcyXSETT+INVKW5J7aPxw jlf4w/uA1syQ+u+Di+HehK8eApdGIbAbXq9/oLF4Z6IwyyzEB0yyOweDQTpXvXnai537 utKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755876272; x=1756481072; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gQ+paMTdnxG6G5FVgKf9n3pDECi4PPdEI9Tq/njK+Iw=; b=lDltfgQXi28jXgv7C5dD7/yeMAK5BKV0uNUhrlpIaebLn2mNgLaL/oJggxyvv7QKsj 1G9rcJ0QKfbFoR5JppYR8iFtcRoukbBoEdnPwdIrXaiULstvqKQ1tko38ArvgV6MjYQY 3t8UIVgs/qaiYHtoioLdnrvyrLpZUJW5nLTeoLhTEMlmWi7fuXM/lmzzXuC5B6Q3KDwm FskG4qdIIo4ARqMkUYTGaDvgP2aZZuYyYRH5tFMwkncwLms/+PQ7+nJcSC8OQ4Wnq8h/ LRf6meZpTqHfBGIPoYgbibpMiAbTW943ycDguO+NyN8H69AGQqVHRzOccZbzR6uyrDTC 4Mdg== X-Gm-Message-State: AOJu0YzL2kZ+k0ju6XW90C+SS1lAcunl1ck4mbu5YSRcc6xC/syw3jNY MaiKHlm4mKQQzfz2sDKutZRTmr5n9w0LXs7oNYw5Kb838hv/0T1wAzy5LoH2ZFV3n3/KFHrShLc HvlEHRqXFbj4B32j0c74h+BbHkb8Dp9Lq X-Gm-Gg: ASbGnctqNpTrvzrArcgqIiddxWUW1F2oaOiV1WYGme7d7GkeR7sTus+lLFLCu019v3b ddFEplON34wQv+jcYg6SxC6IkuwByX0KATQFN/4hNqbsp9vW0iIkAbKNzlg36ZNl9QimHTgKxPC eNGwXu33lcQPzDbAiWz9QkOBxw//Z5JLkBcz/rSF7pLsIrqPZZsgR2GU3ArbliY1FlaBj/Zk6vm DqQ2B8cEyFAs4BEnTJNmjcoBkVZchay+EY4OS8= X-Google-Smtp-Source: AGHT+IHW3e/3tQmzHfqJpwIUwkN55OoqLBeXNkV1mRyYkzKFsyS2K4iUdjNn4jgESUx9d7gtAXSBNPopDULNcduzT2A= X-Received: by 2002:a05:6402:13c1:b0:608:f493:871c with SMTP id 4fb4d7f45d1cf-61c1b40bdf3mr3134866a12.14.1755876271838; Fri, 22 Aug 2025 08:24:31 -0700 (PDT) List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@FreeBSD.org MIME-Version: 1.0 References: In-Reply-To: From: Rick Macklem Date: Fri, 22 Aug 2025 08:24:20 -0700 X-Gm-Features: Ac12FXyyz1NOFJ9SQE2j5oOPWdJRL4VSnBU0jsyIAlme2GXKaPnpGkj2nkI7r5o Message-ID: Subject: Re: NFSv4.2 READ_PLUS support? To: Cedric Blancher Cc: freebsd-hackers@freebsd.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spamd-Bar: -- X-Spamd-Result: default: False [-2.99 / 15.00]; SUBJECT_ENDS_QUESTION(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.99)[-0.993]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36:c]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20230601]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; FREEMAIL_FROM(0.00)[gmail.com]; FREEMAIL_TO(0.00)[gmail.com]; FREEMAIL_ENVFROM(0.00)[gmail.com]; TO_DN_SOME(0.00)[]; MIME_TRACE(0.00)[0:+]; TAGGED_FROM(0.00)[]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; MISSING_XM_UA(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; MLMMJ_DEST(0.00)[freebsd-hackers@freebsd.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; MID_RHS_MATCH_FROMTLD(0.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; TAGGED_RCPT(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::534:from]; RCVD_COUNT_ONE(0.00)[1]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim] X-Rspamd-Queue-Id: 4c7kWV3hxlz3Rr7 On Fri, Aug 22, 2025 at 7:38=E2=80=AFAM Cedric Blancher wrote: > > On Fri, 22 Aug 2025 at 16:29, Rick Macklem wrote= : > > > > On Fri, Aug 22, 2025 at 6:58=E2=80=AFAM Konstantin Belousov wrote: > > > > > > On Fri, Aug 22, 2025 at 06:41:23AM -0700, Rick Macklem wrote: > > > > On Fri, Aug 22, 2025 at 6:31=E2=80=AFAM Cedric Blancher > > > > wrote: > > > > > > > > > > Good afternoon! > > > > > > > > > > Is it planned to support NFSv4.2 READ_PLUS, to optimise reading o= f sparse files? > > > > Not at this time. There is no VOP_READPLUS() vnode operation define= d > > > > at this time. > > > > Without this, the NFS server must either... > > > > - Read all the data and then "parse out" the blobs of zeros. > > > > or > > > > - Use SEEK_DATA/SEEK_HOLE. This sounds reasonable, but it currently= needs > > > > to be done with the vnode unlocked and dropping/re-acquiring the = vnode lock > > > > during a Read operation makes things awkward. > > > > (The unlocked requirement is really just for other things that ar= e done via > > > > VOP_IOCTL().) > > > > > > > > Bottom line, I've missed the FreeBSD-15 deadline for adding any new > > > > VOP_xxx() calls and this needs one. (Either a VOP_SEEK() that can d= o > > > > SEEK_DATA/SEEK_HOLE with the vnode locked or preferably a > > > > VOP_READPLUS(), which can acquire data+holes in whatever is the > > > > most efficient way the underlying fs can do it.) > > > > > > > > So, maybe for FreeBSD-16, but not yet, rick > > > > > > We certainly can add a new VOP to stable, this should not be a proble= m. > > > First, we have spare VOPs in the vop vtable. > > > Second, we do not guarantee KBI stability for VFS. We try to provide= it, > > > but not too hard. If there are benefits like that, KBI can be broken= : we > > > did it many times already. > > > > > Ok. I didn't think this was allowed. I'll admit the case of VOP_READPLU= S() > > looks like it might be a lot of work for the underlying file system > > implementations, > > so FreeBSD-16 is still a pretty good guess. > > > > There are also performance questions, in part because of my lack of > > understanding of ZFS. > > - I do know that sync'ing to get an accurate seek_data/seek_hole is a > > big performance hit (turned off via vfs.zfs.dmu_offset_next_sync=3D0)= . > > And then, since the files are usually compressed.. > > "is there an efficient way to uncompress and mark the holes in a large > > sparse file?" > > And what about large slightly sparse files? (Mostly data with a few sma= ll > > holes.) > > Even deciding if a file is sparse cannot be simply done by comparing > > va_size with va_bytes when the file is compressed. > > > > To be honest, I'd rather have a way to send the compressed file > > data (which would pretty well compress the holes out) on the wire > > than just data+holes (which is what the NFSv4.2 ReadPlus does), > > but that isn't in the 4.2 RFC and would be a lot of work to get through > > the IETF committee as an extension. > > Holes are not sequences of 0x00 bytes. Holes means "no data here". ZFS > compression should preserve the sparse information, otherwise you turn > ANY sequence of 0x00 bytes into holes,and that will break databases > and other applications which depend on exactly that *precise* > semantics. Yes. ZFS retains the hole information but, as you note, that would need to be done "on-the-wire" as well. (I don't intend to try and come up with an extension to NFSv4.2 for compressed file data, so this idea of compressed data on-the-wire was just "dreaming".) There is a performance problem for ZFS related to holes and recently written data (if vfs.zfs.dmu_offset_next_sync=3D1 recently created holes will be found, but it really slows things down). To get this right, it will take someone that really knows ZFS to figure out how to do a VOP_READPLUS() well. The fallback is a VOP_SEEK(), which is easy to do and relatively easy for the NFS server to use, but there will be a big performance tradeoff, based on the setting of vfs.zfs.dmu_offset_next_sync. rick > > Ced > -- > Cedric Blancher > [https://plus.google.com/u/0/+CedricBlancher/] > Institute Pasteur >