Re: RFC: Should copy_file_range(2) work for shared memory objects?

From: Rick Macklem <rick.macklem_at_gmail.com>
Date: Wed, 20 Sep 2023 23:21:46 UTC
On Wed, Sep 20, 2023 at 4:09 PM Rick Macklem <rick.macklem@gmail.com> wrote:
>
> On Wed, Sep 20, 2023 at 3:07 PM Alan Somers <asomers@freebsd.org> wrote:
> >
> > On Wed, Sep 20, 2023 at 3:05 PM Rick Macklem <rick.macklem@gmail.com> wrote:
> > >
> > > Right now (as noted by PR#273962) copy_file_range(2)
> > > fails for shared memory objects because there is no
> > > vnode (f_vnode == NULL) for them and the code uses
> > > vnodes (including a file system specific VOP_COPY_FILE_RANGE(9)).
> > >
> > > Do you think copy_file_range(2) should work for shared memory objects?
> > >
> > > This would require specific handling in kern_copy_file_range()
> > > to work.  I do not think the patch would be a lot of work, but
> > > I am not familiar with the f_ops and shared memory code.
> > >
> > > rick
> >
> > This sounds annoying to fix.  But I think we ought to.  Right now
> > programmers can assume that copy_file_range will work for every type
> > of file.  We don't document an EOPNOTSUP error code or anything like
> > that.  Does it work on sockets, too?
> No. I guess I have a different definition of "file" (unless you meant
> "filedesc"?).  I cannot see how a "range is defined for sockets
> or named pipes or...". It currently checks for a f_vnode, which
> probably is not enough. (I haven't figured out what path_fileops
> are, so I do not know if they work?)
>
> I can see how it can be implemented for shared memory objects.
> However, this is going to take a fair amount of work, since they
> do not use vnodes.
> I think it goes something like this:
> - Create a new fileops (f_copy_file_range), since it needs to use
>   the correct range lock variables (in shmfd instead of vnode ones).
> - Move most of kern_copy_file_range() into vnodeop_copy_file_range()
>   and call f_copy_file_range() from kern_copy_file_range().
> - Create a shm_copy_file_range() that does the correct range locking
>   and then copies via uiomove().
> This would be a KABI change, so I do not think it could be MFC'd.
>
> I think there is a need for copy_file_range(2) to return EOPNOTSUP
> for cases it will never handle. (I need to test AF_LOCAL sockets,
> since I think they have vnodes?)
copy_file_range(2) does currently return EOPNOTSUPP for unix
domain (AF_LOCAL) sockets. The man page needs to be fixed,
whether or not support for shared memory objects is added.

rick

>
> rick