Re: NFSv4.2 READ_PLUS support?

From: Cedric Blancher <cedric.blancher_at_gmail.com>
Date: Fri, 22 Aug 2025 14:37:13 UTC
On Fri, 22 Aug 2025 at 16:29, Rick Macklem <rick.macklem@gmail.com> wrote:
>
> On Fri, Aug 22, 2025 at 6:58 AM Konstantin Belousov <kostikbel@gmail.com> wrote:
> >
> > On Fri, Aug 22, 2025 at 06:41:23AM -0700, Rick Macklem wrote:
> > > On Fri, Aug 22, 2025 at 6:31 AM Cedric Blancher
> > > <cedric.blancher@gmail.com> wrote:
> > > >
> > > > Good afternoon!
> > > >
> > > > Is it planned to support NFSv4.2 READ_PLUS, to optimise reading of sparse files?
> > > Not at this time. There is no VOP_READPLUS() vnode operation defined
> > > at this time.
> > > Without this, the NFS server must either...
> > > - Read all the data and then "parse out" the blobs of zeros.
> > > or
> > > - Use SEEK_DATA/SEEK_HOLE. This sounds reasonable, but it currently needs
> > >   to be done with the vnode unlocked and dropping/re-acquiring the vnode lock
> > >   during a Read operation makes things awkward.
> > >   (The unlocked requirement is really just for other things that are done via
> > >     VOP_IOCTL().)
> > >
> > > Bottom line, I've missed the FreeBSD-15 deadline for adding any new
> > > VOP_xxx() calls and this needs one. (Either a VOP_SEEK() that can do
> > > SEEK_DATA/SEEK_HOLE with the vnode locked or preferably a
> > > VOP_READPLUS(), which can acquire data+holes in whatever is the
> > > most efficient way the underlying fs can do it.)
> > >
> > > So, maybe for FreeBSD-16, but not yet, rick
> >
> > We certainly can add a new VOP to stable, this should not be a problem.
> > First, we have spare VOPs in the vop vtable.
> > Second, we do not guarantee KBI stability for VFS.  We try to provide it,
> > but not too hard.  If there are benefits like that, KBI can be broken: we
> > did it many times already.
> >
> Ok. I didn't think this was allowed. I'll admit the case of VOP_READPLUS()
> looks like it might be a lot of work for the underlying file system
> implementations,
> so FreeBSD-16 is still a pretty good guess.
>
> There are also performance questions, in part because of my lack of
> understanding of ZFS.
> - I do know that sync'ing to get an accurate seek_data/seek_hole is a
>   big performance hit (turned off via vfs.zfs.dmu_offset_next_sync=0).
> And then, since the files are usually compressed..
> "is there an efficient way to uncompress and mark the holes in a large
> sparse file?"
> And what about large slightly sparse files? (Mostly data with a few small
> holes.)
> Even deciding if a file is sparse cannot be simply done by comparing
> va_size with va_bytes when the file is compressed.
>
> To be honest, I'd rather have a way to send the compressed file
> data (which would pretty well compress the holes out) on the wire
> than just data+holes (which is what the NFSv4.2 ReadPlus does),
> but that isn't in the 4.2 RFC and would be a lot of work to get through
> the IETF committee as an extension.

Holes are not sequences of 0x00 bytes. Holes means "no data here". ZFS
compression should preserve the sparse information, otherwise you turn
ANY sequence of 0x00 bytes into holes,and that will break databases
and other applications which depend on exactly that *precise*
semantics.

Ced
-- 
Cedric Blancher <cedric.blancher@gmail.com>
[https://plus.google.com/u/0/+CedricBlancher/]
Institute Pasteur