[Bug 260453] ZFS truncated write to O_APPEND file from mmap'ed memory

From: <bugzilla-noreply_at_freebsd.org>
Date: Wed, 15 Dec 2021 22:09:55 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=260453

Mark Johnston <markj@FreeBSD.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|New                         |Open
                 CC|                            |markj@FreeBSD.org

--- Comment #1 from Mark Johnston <markj@FreeBSD.org> ---
I dug into this a little bit.  The program creates a file and writes some data
via write(2) (so data does not appear in the page cache, only in the DMU). 
Immediately after, the file is mapped for reading and data from the mapping is
written to a different file, and I see:

  1988 aardwarc CALL  lseek(0x6,0,SEEK_END)
  1988 aardwarc RET   lseek 2378/0x94a
  1988 aardwarc CALL  writev(0x6,0x7fffffffe360,0x2)
  1988 aardwarc PFLT  0x8002634cb 0x1<VM_PROT_READ>
  1988 aardwarc PRET  KERN_PROTECTION_FAILURE
  1988 aardwarc GIO   fd 6 wrote 479 bytes
  <file data>
  1988 aardwarc RET   writev 479/0x1df
  1988 aardwarc CALL  lseek(0x6,0,SEEK_END)
  1988 aardwarc RET   lseek 2570/0xa0a

So we get a page fault while reading from the mapping, which is expected, and
dmu_write_uio_dbuf() returns EFAULT, which is the magic signal for
vn_io_fault1() to retry after wiring the mapping.

Some tracing indicates that dmu_write_uio_dbuf() does manage to write some data
to the file before hitting EFAULT.  In fact, the amount of data written in the
first try is exactly 479 - (2570 - 2378) bytes.

I think the bug is that the EFAULT causes this bit of code in zfs_write() to be
skipped:

                /*
                 * Update the file size (zp_size) if it has changed;
                 * account for possible concurrent updates.
                 */
                while ((end_size = zp->z_size) < zfs_uio_offset(uio)) {
                        (void) atomic_cas_64(&zp->z_size, end_size,
                            zfs_uio_offset(uio));
                        ASSERT(error == 0);
                }

z_size contains the file size returned by VOP_GETATTR(), used to provide the
return value for lseek(SEEK_END).  But... this bug appears to exist in main
too.  So how does this work at all?

Note, part of the weirdness here comes from the fact that some of the input
file data is written in the first try.  I would expect dmu_write_uio_dbuf() to
return EFAULT without having written anything.

-- 
You are receiving this mail because:
You are the assignee for the bug.