[Bug 276002] nfscl: data corruption using both copy_file_range and mmap'd I/O

From: <bugzilla-noreply_at_freebsd.org>
Date: Mon, 01 Jan 2024 21:55:10 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276002

Rick Macklem <rmacklem@FreeBSD.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |geoffrey@dommett.com

--- Comment #34 from Rick Macklem <rmacklem@FreeBSD.org> ---
Ok, here is my understanding of what currently can happen.
Hopefully Kostik will correct me if I have this wrong.

#1 - File is open(2)'d.
#2 - A byte range (lets say the 1st 100Mbytes) is
     mmap(2)'d into the address space
#3 - Some addresses within this address space are
     modified by the process, dirtying the corresponding
     pages.
#4 - File is read(2) sequentially.

Now, when #4 happens, there will be read-aheads done
by the nfsiod threads. These simply do Read RPCs against
the NFS server to read the byte ranges of the file into
the buffer cache blocks.
They are done asynchronously and without any vnode lock.
--> At this time, I do not see anything that stops these
    read-aheads from filling the buffer cache blocks/pages
    from the NFS server's now stale data.

Now, I thought adding a msync(2) with MS_SYNC between
#3 and #4 would be sufficient to cause the pages dirtied
by #3 to be written to the NFS server (via VOP_PUTPAGES(),
which is ncl_putpages()).
I believe that an fsync(2) between #3 and #4 will also
write the dirtied pages to the NFS server.

Without either a msync(2) or fsync(2) between #3 and #4,
what could be done to make this work?
- Don't do read-ahead.  This would be a major performance
  hit and is imho a non-starter.
- Don't do read-ahead when a file is mmap(2)'d. This sounds
  better, since it will be a rare case that a file will be
  both mmap(2)'d and read via read(2) syscalls.
  --> To do this, the NFS client needs to know if the file
      has been mmap(2)'d.
      A flag could be set on the vnode when the file is mmap(2)'d
      and that flag can be checked by the NFS client.
      --> The problem is when can the flag be cleared?
          My recollection from a previous round of discussing
          this is...not until all the process(es) that mmap(2)'d
          the file exit. (I cannot recall if the vnode's
          v_usecount going to 0 is sufficient.)
- Having some way that the nfsiod threads can check to see if there
  are dirty pages related to the buffer cache block and write those
  back to the NFS server before doing the read. (Recall that the
  buffer cache block will be quite a few pages, typically 128K to 1Mbyte
  in size.)
  --> This could be done by having the nfsiod thread LK_EXCLUSIVE lock
      the vnode, but that would be a major performance hit, as well.

That's as far as I've gotten in previous discussions about this.

Note that this PR started with a specific problem related to
copy_file_range(2) and that has been fixed (or kib@'s patch will
fix it when committed).
The more general case as above, well??

-- 
You are receiving this mail because:
You are the assignee for the bug.