Re: NFSv4.2 READ_PLUS support?

From: Rick Macklem <rick.macklem_at_gmail.com>
Date: Fri, 22 Aug 2025 15:24:20 UTC
On Fri, Aug 22, 2025 at 7:38 AM Cedric Blancher
<cedric.blancher@gmail.com> wrote:
>
> On Fri, 22 Aug 2025 at 16:29, Rick Macklem <rick.macklem@gmail.com> wrote:
> >
> > On Fri, Aug 22, 2025 at 6:58 AM Konstantin Belousov <kostikbel@gmail.com> wrote:
> > >
> > > On Fri, Aug 22, 2025 at 06:41:23AM -0700, Rick Macklem wrote:
> > > > On Fri, Aug 22, 2025 at 6:31 AM Cedric Blancher
> > > > <cedric.blancher@gmail.com> wrote:
> > > > >
> > > > > Good afternoon!
> > > > >
> > > > > Is it planned to support NFSv4.2 READ_PLUS, to optimise reading of sparse files?
> > > > Not at this time. There is no VOP_READPLUS() vnode operation defined
> > > > at this time.
> > > > Without this, the NFS server must either...
> > > > - Read all the data and then "parse out" the blobs of zeros.
> > > > or
> > > > - Use SEEK_DATA/SEEK_HOLE. This sounds reasonable, but it currently needs
> > > >   to be done with the vnode unlocked and dropping/re-acquiring the vnode lock
> > > >   during a Read operation makes things awkward.
> > > >   (The unlocked requirement is really just for other things that are done via
> > > >     VOP_IOCTL().)
> > > >
> > > > Bottom line, I've missed the FreeBSD-15 deadline for adding any new
> > > > VOP_xxx() calls and this needs one. (Either a VOP_SEEK() that can do
> > > > SEEK_DATA/SEEK_HOLE with the vnode locked or preferably a
> > > > VOP_READPLUS(), which can acquire data+holes in whatever is the
> > > > most efficient way the underlying fs can do it.)
> > > >
> > > > So, maybe for FreeBSD-16, but not yet, rick
> > >
> > > We certainly can add a new VOP to stable, this should not be a problem.
> > > First, we have spare VOPs in the vop vtable.
> > > Second, we do not guarantee KBI stability for VFS.  We try to provide it,
> > > but not too hard.  If there are benefits like that, KBI can be broken: we
> > > did it many times already.
> > >
> > Ok. I didn't think this was allowed. I'll admit the case of VOP_READPLUS()
> > looks like it might be a lot of work for the underlying file system
> > implementations,
> > so FreeBSD-16 is still a pretty good guess.
> >
> > There are also performance questions, in part because of my lack of
> > understanding of ZFS.
> > - I do know that sync'ing to get an accurate seek_data/seek_hole is a
> >   big performance hit (turned off via vfs.zfs.dmu_offset_next_sync=0).
> > And then, since the files are usually compressed..
> > "is there an efficient way to uncompress and mark the holes in a large
> > sparse file?"
> > And what about large slightly sparse files? (Mostly data with a few small
> > holes.)
> > Even deciding if a file is sparse cannot be simply done by comparing
> > va_size with va_bytes when the file is compressed.
> >
> > To be honest, I'd rather have a way to send the compressed file
> > data (which would pretty well compress the holes out) on the wire
> > than just data+holes (which is what the NFSv4.2 ReadPlus does),
> > but that isn't in the 4.2 RFC and would be a lot of work to get through
> > the IETF committee as an extension.
>
> Holes are not sequences of 0x00 bytes. Holes means "no data here". ZFS
> compression should preserve the sparse information, otherwise you turn
> ANY sequence of 0x00 bytes into holes,and that will break databases
> and other applications which depend on exactly that *precise*
> semantics.
Yes. ZFS retains the hole information but, as you note, that would need
to be done "on-the-wire" as well. (I don't intend to try and come up with
an extension to NFSv4.2 for compressed file data, so this idea of
compressed data on-the-wire was just "dreaming".)

There is a performance problem for ZFS related to holes and recently
written data (if vfs.zfs.dmu_offset_next_sync=1 recently created holes
will be found, but it really slows things down).
To get this right, it will take someone that really knows ZFS to figure
out how to do a VOP_READPLUS() well.

The fallback is a VOP_SEEK(), which is easy to do and relatively
easy for the NFS server to use, but there will be a big performance
tradeoff, based on the setting of vfs.zfs.dmu_offset_next_sync.

rick

>
> Ced
> --
> Cedric Blancher <cedric.blancher@gmail.com>
> [https://plus.google.com/u/0/+CedricBlancher/]
> Institute Pasteur
>