Re: NFSv4.2 READ_PLUS support?

From: Cedric Blancher <cedric.blancher_at_gmail.com>
Date: Fri, 22 Aug 2025 14:16:45 UTC
On Fri, 22 Aug 2025 at 15:59, Konstantin Belousov <kostikbel@gmail.com> wrote:
>
> On Fri, Aug 22, 2025 at 06:41:23AM -0700, Rick Macklem wrote:
> > On Fri, Aug 22, 2025 at 6:31 AM Cedric Blancher
> > <cedric.blancher@gmail.com> wrote:
> > >
> > > Good afternoon!
> > >
> > > Is it planned to support NFSv4.2 READ_PLUS, to optimise reading of sparse files?
> > Not at this time. There is no VOP_READPLUS() vnode operation defined
> > at this time.
> > Without this, the NFS server must either...
> > - Read all the data and then "parse out" the blobs of zeros.
> > or
> > - Use SEEK_DATA/SEEK_HOLE. This sounds reasonable, but it currently needs
> >   to be done with the vnode unlocked and dropping/re-acquiring the vnode lock
> >   during a Read operation makes things awkward.
> >   (The unlocked requirement is really just for other things that are done via
> >     VOP_IOCTL().)
> >
> > Bottom line, I've missed the FreeBSD-15 deadline for adding any new
> > VOP_xxx() calls and this needs one. (Either a VOP_SEEK() that can do
> > SEEK_DATA/SEEK_HOLE with the vnode locked or preferably a
> > VOP_READPLUS(), which can acquire data+holes in whatever is the
> > most efficient way the underlying fs can do it.)
> >
> > So, maybe for FreeBSD-16, but not yet, rick
>
> We certainly can add a new VOP to stable, this should not be a problem.
> First, we have spare VOPs in the vop vtable.
> Second, we do not guarantee KBI stability for VFS.  We try to provide it,
> but not too hard.  If there are benefits like that, KBI can be broken: we
> did it many times already.

That might be cool, especially if this could be exposed as
(experimental) userland API for databases

The hard part is that READ_PLUS returns an union of { record_type, {
data_pos,data_size,data_bytes }, { hole_pos,hole_size} }, which can
technically have a maximum amount of array entries of
(read_request_size/$MIN_HOLE)+1, where $MIN_HOLE is the minimum size
of a sparse file hole.
record_type could be either DATA or HOLE, but inthe future could hold
more types (like application specific data, NFSv4.2 RFC READ_PLUS
debates that)

So two APIs are needed:
int getreadplusmaxarray(int fd, int flags); /* flags just for future
extensions */
int readplus(int fd, void *buf, size_t nbytes, union *readplusbuffers,
size_t num_readplusbuffers, size_t *read_num_readplusbuffers, int
flags);

fd is the fd
buf is the memory used to store data
nbytes is the size of the memory for buf
readplusbuffers is an array of unions { {
data_pos,data_size,data_bytes }, { hole_pos,hole_size} }
num_readplusbuffers is the maximum element numbers in readplusbuffers
read_num_readplusbuffers is the number of elements filled in
flags is for extensions

OT: /bin/getconf MIN_HOLE $PATH does not work. Works on Solaris,Illumos,

Ced
-- 
Cedric Blancher <cedric.blancher@gmail.com>
[https://plus.google.com/u/0/+CedricBlancher/]
Institute Pasteur