Re: NFSv4.2 READ_PLUS support?
- Reply: Rob Norris: "Re: NFSv4.2 READ_PLUS support?"
- In reply to: Cedric Blancher : "Re: NFSv4.2 READ_PLUS support?"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 22 Aug 2025 15:24:20 UTC
On Fri, Aug 22, 2025 at 7:38 AM Cedric Blancher <cedric.blancher@gmail.com> wrote: > > On Fri, 22 Aug 2025 at 16:29, Rick Macklem <rick.macklem@gmail.com> wrote: > > > > On Fri, Aug 22, 2025 at 6:58 AM Konstantin Belousov <kostikbel@gmail.com> wrote: > > > > > > On Fri, Aug 22, 2025 at 06:41:23AM -0700, Rick Macklem wrote: > > > > On Fri, Aug 22, 2025 at 6:31 AM Cedric Blancher > > > > <cedric.blancher@gmail.com> wrote: > > > > > > > > > > Good afternoon! > > > > > > > > > > Is it planned to support NFSv4.2 READ_PLUS, to optimise reading of sparse files? > > > > Not at this time. There is no VOP_READPLUS() vnode operation defined > > > > at this time. > > > > Without this, the NFS server must either... > > > > - Read all the data and then "parse out" the blobs of zeros. > > > > or > > > > - Use SEEK_DATA/SEEK_HOLE. This sounds reasonable, but it currently needs > > > > to be done with the vnode unlocked and dropping/re-acquiring the vnode lock > > > > during a Read operation makes things awkward. > > > > (The unlocked requirement is really just for other things that are done via > > > > VOP_IOCTL().) > > > > > > > > Bottom line, I've missed the FreeBSD-15 deadline for adding any new > > > > VOP_xxx() calls and this needs one. (Either a VOP_SEEK() that can do > > > > SEEK_DATA/SEEK_HOLE with the vnode locked or preferably a > > > > VOP_READPLUS(), which can acquire data+holes in whatever is the > > > > most efficient way the underlying fs can do it.) > > > > > > > > So, maybe for FreeBSD-16, but not yet, rick > > > > > > We certainly can add a new VOP to stable, this should not be a problem. > > > First, we have spare VOPs in the vop vtable. > > > Second, we do not guarantee KBI stability for VFS. We try to provide it, > > > but not too hard. If there are benefits like that, KBI can be broken: we > > > did it many times already. > > > > > Ok. I didn't think this was allowed. I'll admit the case of VOP_READPLUS() > > looks like it might be a lot of work for the underlying file system > > implementations, > > so FreeBSD-16 is still a pretty good guess. > > > > There are also performance questions, in part because of my lack of > > understanding of ZFS. > > - I do know that sync'ing to get an accurate seek_data/seek_hole is a > > big performance hit (turned off via vfs.zfs.dmu_offset_next_sync=0). > > And then, since the files are usually compressed.. > > "is there an efficient way to uncompress and mark the holes in a large > > sparse file?" > > And what about large slightly sparse files? (Mostly data with a few small > > holes.) > > Even deciding if a file is sparse cannot be simply done by comparing > > va_size with va_bytes when the file is compressed. > > > > To be honest, I'd rather have a way to send the compressed file > > data (which would pretty well compress the holes out) on the wire > > than just data+holes (which is what the NFSv4.2 ReadPlus does), > > but that isn't in the 4.2 RFC and would be a lot of work to get through > > the IETF committee as an extension. > > Holes are not sequences of 0x00 bytes. Holes means "no data here". ZFS > compression should preserve the sparse information, otherwise you turn > ANY sequence of 0x00 bytes into holes,and that will break databases > and other applications which depend on exactly that *precise* > semantics. Yes. ZFS retains the hole information but, as you note, that would need to be done "on-the-wire" as well. (I don't intend to try and come up with an extension to NFSv4.2 for compressed file data, so this idea of compressed data on-the-wire was just "dreaming".) There is a performance problem for ZFS related to holes and recently written data (if vfs.zfs.dmu_offset_next_sync=1 recently created holes will be found, but it really slows things down). To get this right, it will take someone that really knows ZFS to figure out how to do a VOP_READPLUS() well. The fallback is a VOP_SEEK(), which is easy to do and relatively easy for the NFS server to use, but there will be a big performance tradeoff, based on the setting of vfs.zfs.dmu_offset_next_sync. rick > > Ced > -- > Cedric Blancher <cedric.blancher@gmail.com> > [https://plus.google.com/u/0/+CedricBlancher/] > Institute Pasteur >