Re: RFC: Should copy_file_range(2) work for shared memory objects?

From: Rick Macklem <rick.macklem_at_gmail.com>
Date: Thu, 21 Sep 2023 00:30:57 UTC
On Wed, Sep 20, 2023 at 4:52 PM Alan Somers <asomers@freebsd.org> wrote:
>
> On Wed, Sep 20, 2023 at 4:47 PM Rick Macklem <rick.macklem@gmail.com> wrote:
> >
> > On Wed, Sep 20, 2023 at 4:21 PM Rick Macklem <rick.macklem@gmail.com> wrote:
> > >
> > > On Wed, Sep 20, 2023 at 4:09 PM Rick Macklem <rick.macklem@gmail.com> wrote:
> > > >
> > > > On Wed, Sep 20, 2023 at 3:07 PM Alan Somers <asomers@freebsd.org> wrote:
> > > > >
> > > > > On Wed, Sep 20, 2023 at 3:05 PM Rick Macklem <rick.macklem@gmail.com> wrote:
> > > > > >
> > > > > > Right now (as noted by PR#273962) copy_file_range(2)
> > > > > > fails for shared memory objects because there is no
> > > > > > vnode (f_vnode == NULL) for them and the code uses
> > > > > > vnodes (including a file system specific VOP_COPY_FILE_RANGE(9)).
> > > > > >
> > > > > > Do you think copy_file_range(2) should work for shared memory objects?
> > > > > >
> > > > > > This would require specific handling in kern_copy_file_range()
> > > > > > to work.  I do not think the patch would be a lot of work, but
> > > > > > I am not familiar with the f_ops and shared memory code.
> > > > > >
> > > > > > rick
> > > > >
> > > > > This sounds annoying to fix.  But I think we ought to.  Right now
> > > > > programmers can assume that copy_file_range will work for every type
> > > > > of file.  We don't document an EOPNOTSUP error code or anything like
> > > > > that.  Does it work on sockets, too?
> > > > No. I guess I have a different definition of "file" (unless you meant
> > > > "filedesc"?).  I cannot see how a "range is defined for sockets
> > > > or named pipes or...". It currently checks for a f_vnode, which
> > > > probably is not enough. (I haven't figured out what path_fileops
> > > > are, so I do not know if they work?)
> > > >
> > > > I can see how it can be implemented for shared memory objects.
> > > > However, this is going to take a fair amount of work, since they
> > > > do not use vnodes.
> > > > I think it goes something like this:
> > > > - Create a new fileops (f_copy_file_range), since it needs to use
> > > >   the correct range lock variables (in shmfd instead of vnode ones).
> > > > - Move most of kern_copy_file_range() into vnodeop_copy_file_range()
> > > >   and call f_copy_file_range() from kern_copy_file_range().
> > > > - Create a shm_copy_file_range() that does the correct range locking
> > > >   and then copies via uiomove().
> > > > This would be a KABI change, so I do not think it could be MFC'd.
> > > >
> > > > I think there is a need for copy_file_range(2) to return EOPNOTSUP
> > > > for cases it will never handle. (I need to test AF_LOCAL sockets,
> > > > since I think they have vnodes?)
> > > copy_file_range(2) does currently return EOPNOTSUPP for unix
> > > domain (AF_LOCAL) sockets. The man page needs to be fixed,
> > > whether or not support for shared memory objects is added.
> > >
> > Oops, my mistake.  It was the open(2) that failed with EOPNOTSUPP,
> > not copy_file_range(2). (I have a simple test program that open(2)s
> > file names and then uses copy_file_range(2) on the descriptors.
> > Btw, an open(2) with O_PATH works, but no data is copied.
> > Not sure if that should be considered correct behaviour?
>
> Do you mean that copy_file_range returns 0 for AF_LOCAL sockets?  That
> sounds suspicious.  0 could be interpreted as EoF.  Could you please
> share your test program?
No, it meant that copy_file-range(2) never got called.
Btw, if copy_file_range(2) starts returning anything other than EINVAL
for inappropriate file descriptors, cp(1) and maybe cat(1) will need to
be fixed. (cp checks for EINVAL and cat checks for EINVAL and EBADF,
to decide if a non-copy_file_range(2) should be done when copy_file-range(2)
fails.)

rick