Re: Implementing VOP_READPLUS() in FreeBSD 15?

In reply to: Aurélien_Couderc : "Implementing VOP_READPLUS() in FreeBSD 15?"
Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: Rick Macklem <rick.macklem_at_gmail.com>
Date: Fri, 07 Nov 2025 02:01:36 UTC
On Thu, Nov 6, 2025 at 11:40 AM Aurélien Couderc
<aurelien.couderc2002@gmail.com> wrote:
>
> This is a followup to a discussion with the nfs-ganesha developers.
>
> Could FreeBSD implement a VOP_READPLUS() in FreeBSD 15, please?
>
> Citing Lionel Cons/CERN:
> > But the point is to optimise the read(). First, you have less traffic over the wire (which is a
> > thing if your reads are in the gigabyte range for large VMs), and it tells the VM host that it
> > can just map all those MMU pages representing the hole to the "default zero page", which
> > in turn saves lots of space in the L3 and L2 caches ----> THIS DOES WONDERS to VM
> > performance.
> >
> > Example:
> > The performance benefit here comes from the fast that instead of mapping a 1TB hole
> > (1099511627776 bytes) to individual 524288 2M pages (x86 2M hugepage size), and then
> > potentially reading from them, you just have ONE 2M page in the cache, and all reads come
> > from that.
> >
> > READ_PLUS is THE game changer for that kind of application, especially in our case (HPC
> > simulations).
Why doesn't the application use lseek(SEEK_DATA/SEEK_HOLE) and only read(2) the
data segments?

This is implemented now in FreeBSD and in several other POSIX-like OSs
and avoids
problems like filling the buffer cache with blocks of all zeros or
returning a lot of blocks
with all zeros to the application via read(2).

Right now, I not aware of any read_plus(2) syscall (please correct me
if I am wrong on this),
so applications that read(2) sparse files without bothering to do
lseek(SEEK_DATA/SEEK_HOLE) will get a lot of 0s to process.

To do VOP_READPLUS() is a lot of work. Once the VOP_READPLUS() is defined,
there needs to be implementations in the various local fs (ZFS, UFS,
..). That requires
work by people who know these areas. I am only minimally conversant with either
ZFS or UFS and would not want to attempt to do a good VOP_READPLUS()
implementation for either of them. (Without fs specific
implementations, there isn't
much point in doing it, imho.)

If VOP_READPLUS() is done, but there is no readplus(2) syscall, then the
applications still get globs of 0s in the read(2) reply (assuming the
application
doesn't bother to use lseek(SEEK_DATA/SEEK_HOLE) to skip over the
holes in a sparse file).
--> Even if FreeBSD were to "go out on a limb" and implement a
     readplus(2) syscall, who would use it. (Not anyone implementing
     a POSIX compliant application nor anyone implementing a Linux
     application.)
     --> Until Linux does some syscall like readplus(2) someday maybe
           I still question how useful VOP_READPLUS() is even if it has
           fs specific implementations.

At least that's how I see it, rick

>
> I just played with that:
>
> 1. Intel XEON with 512GB
> 2. loading 16 files with 64GB sparse files which are only holes
> 3. create kernel core dump
> Result: Almost all pages in the file cache are zero bytes.
>
> VOP_READPLUS() would optimize this case, and map all ranges belonging
> to sparse file holes into the same read-only MMU page representing a
> physical address range containing zero bytes. Because it's the same
> physical memory it would consume very little L2/L3 cache space, and
> save space in the filesystem cache too.
>
> Aurélien
> --
> Aurélien Couderc <aurelien.couderc2002@gmail.com>
> Big Data/Data mining expert, chess enthusiast
>