Does FreeBSD have sendmmsg or recvmmsg system calls?

Boris Astardzhiev boris.astardzhiev at gmail.com
Thu Jan 7 13:37:13 UTC 2016


First of all thanks for the insightful code review.

>>On Thu, Jan 07, 2016 at 12:28:32PM +0200, Boris Astardzhiev wrote:
>>See inline comments and final notes at the end.
>>
>> diff --git a/lib/libc/include/libc_private.h
b/lib/libc/include/libc_private.h
>> index 5caf9a3..7e2a902 100644
>> --- a/lib/libc/include/libc_private.h
>> +++ b/lib/libc/include/libc_private.h
>> @@ -200,8 +200,10 @@ enum {
>>       INTERPOS_pselect,
>>       INTERPOS_recvfrom,
>>       INTERPOS_recvmsg,
>> +     INTERPOS_recvmmsg,
>>       INTERPOS_select,
>>       INTERPOS_sendmsg,
>> +     INTERPOS_sendmmsg,
>>       INTERPOS_sendto,
>>       INTERPOS_setcontext,
>>       INTERPOS_sigaction,
>The interposing table must be extended at the end, and not in the middle.
>otherwise you introduce too large inconsistence with older libthr.
>
>That said, you changed libc table, but did not updated libthr.  The result
>is random segfaults in the multithreaded processes.

Thanks for this catch. I'll fix it.

>> diff --git a/lib/libc/sys/Symbol.map b/lib/libc/sys/Symbol.map
>> index 7b3257c..6cc3c6e 100644
>> --- a/lib/libc/sys/Symbol.map
>> +++ b/lib/libc/sys/Symbol.map
>> @@ -220,6 +220,7 @@ FBSD_1.0 {
>>       reboot;
>>       recvfrom;
>>       recvmsg;
>> +     recvmmsg;
>>       rename;
>>       revoke;
>>       rfork;
>> @@ -240,6 +241,7 @@ FBSD_1.0 {
>>       semsys;
>>       sendfile;
>>       sendmsg;
>> +     sendmmsg;
>>       sendto;
>>       setaudit;
>>       setaudit_addr;
>The versioning is wrong, new non-private symbols for 11.0 go into _1.4.
>
>> @@ -851,6 +853,8 @@ FBSDprivate_1.0 {
>>       __sys_recvfrom;
>>       _recvmsg;
>>       __sys_recvmsg;
>> +     _recvmmsg;
>> +     __sys_recvmmsg;
>>       _rename;
>>       __sys_rename;
>>       _revoke;
>> @@ -891,6 +895,8 @@ FBSDprivate_1.0 {
>>       __sys_sendfile;
>>       _sendmsg;
>>       __sys_sendmsg;
>> +     _sendmmsg;
>> +     __sys_sendmmsg;
>>       _sendto;
>>       __sys_sendto;
>>       _setaudit;

I see. I'll put everything in FBSD_1.4.

>> diff --git a/sys/kern/uipc_syscalls.c b/sys/kern/uipc_syscalls.c
>> index c33a2cf..7354b4f 100644
>> --- a/sys/kern/uipc_syscalls.c
>> +++ b/sys/kern/uipc_syscalls.c
>
>> +int
>> +sys_recvmmsg(td, uap)
>> +     struct thread *td;
>> +     struct recvmmsg_args /* {
>> +             int     s;
>> +             struct mmsghdr  *msgvec;
>> +             unsigned int    vlen;
>> +             int     flags;
>> +     } */ *uap;
>Use ANSI C definitions for new code, please.
>I personally do not inline uap declaration, the time has gone.

I'll fix it.

>> +{
>> +     struct mmsghdr *vec, *mmp;
>> +     struct msghdr *mp, msg;
>> +     struct iovec *uiov, *iov;
>> +     unsigned int vlen, len;
>> +     int i, rcvd = 0, error;
>> +     struct socket *so = NULL;
>Style, no initialization in declaration.

I'll fix it.

>> +     struct file *fp;
>> +     cap_rights_t rights;
>> +
>> +     if (fget(td, uap->s, cap_rights_init(&rights, CAP_RECV), &fp) != 0)
>> +             return (EBADF);
>> +
>> +     so = fp->f_data;
>What if uap->s filedescriptor does not reference a socket ?

kern_sendit/recvit() eventually call getsock_cap().
To me this means that send/recv(m)msg have to be limited only to sockets.
I was in a need to grab access to so_error due to the error handling later.
Probably I need to use something similar like kern_sendit/recvit's
getsock_cap approach.
Comments or a better approach to so_error?

>>> +
>>> +     vlen = uap->vlen;
>>> +     if (vlen > UIO_MAXIOV)
>>> +             vlen = UIO_MAXIOV;
>>> +
>>> +     len = vlen * sizeof(*uap->msgvec);
>>> +     vec = malloc(len, M_TEMP, M_WAITOK);
>>This is a user-controlled malloc(9), i.e. an easy kernel panic due to
>>impossible to satisfy request, or KVA and memory exhaustion.

Having a glance in the manpage there it is stated that:
     M_WAITOK
             Indicates that it is OK to wait for resources.  If the request
             cannot be immediately fulfilled, the current process is put to
             sleep to wait for resources to be released by other processes.
             The malloc(), realloc(), and reallocf() functions cannot return
             NULL if M_WAITOK is specified.
I.e we'll be put to sleep and no NULL will be returned. I don't see how I'll
get a panic here. Or you're stressing the malloc type M_TEMP here?
More clarifications?

>> +
>> +     error = copyin(uap->msgvec, vec, len);
>> +     if (error != 0)
>> +             goto out;
>> +
>> +     for (i = 0; i < vlen; i++) {
>> +             mmp = &vec[i];
>> +             mp = &mmp->msg_hdr;
>> +             msg = *mp;
>> +
>> +             error = copyiniov(msg.msg_iov, msg.msg_iovlen, &iov,
EMSGSIZE);
>> +             if (error != 0)
>> +                     goto out;
>> +
>> +             msg.msg_flags = uap->flags;
>> +#ifdef COMPAT_OLDSOCK
>> +             msg.msg_flags &= ~MSG_COMPAT;
>> +#endif
>> +             uiov = msg.msg_iov;
>> +             msg.msg_iov = iov;
>> +
>> +             error = recvit(td, uap->s, &msg, NULL);
>> +             if (error == 0) {
>> +                     msg.msg_iov = uiov;
>> +                     error = copyout(&msg, &uap->msgvec[i].msg_hdr,
sizeof(msg));
>> +                     if (error != 0) {
>> +                             free(iov, M_IOV);
>> +                             goto out;
>> +                     }
>> +                     error = copyout(&td->td_retval[0],
&uap->msgvec[i].msg_len,
>> +                         sizeof(td->td_retval[0]));
>> +                     if (error != 0) {
>> +                             free(iov, M_IOV);
>> +                             goto out;
>> +                     }
>> +             }
>> +             free(iov, M_IOV);
>> +             rcvd++;
>> +     }
>> +
>> +     td->td_retval[0] = rcvd;
>> +
>> +out:
>> +     if (error != 0 && rcvd != 0 && rcvd != vlen) {
>> +             so->so_error = error;
>> +             error = 0;
>> +             td->td_retval[0] = rcvd;
>> +     }
>> +
>> +     fdrop(fp, td);
>> +
>> +     free(vec, M_TEMP);
>> +     return (error);
>> +}
>> +
>>  /* ARGSUSED */
>>  int
>>  sys_shutdown(td, uap)
>> diff --git a/sys/sys/socket.h b/sys/sys/socket.h
>> index 18e2de1..d352cd2 100644
>> --- a/sys/sys/socket.h
>> +++ b/sys/sys/socket.h
>> @@ -414,6 +414,11 @@ struct msghdr {
>>       int              msg_flags;             /* flags on received
message */
>>  };
>>
>> +struct mmsghdr {
>> +     struct msghdr   msg_hdr;                /* message header */
>> +     unsigned int    msg_len;                /* message length  */
>> +};
>> +
>>  #define      MSG_OOB         0x1             /* process out-of-band
data */
>>  #define      MSG_PEEK        0x2             /* peek at incoming
message */
>>  #define      MSG_DONTROUTE   0x4             /* send without using
routing tables */
>> @@ -615,10 +620,12 @@ int     listen(int, int);
>>  ssize_t      recv(int, void *, size_t, int);
>>  ssize_t      recvfrom(int, void *, size_t, int, struct sockaddr *
__restrict, socklen_t * __restrict);
>>  ssize_t      recvmsg(int, struct msghdr *, int);
>> +ssize_t      recvmmsg(int, struct mmsghdr *, unsigned int, int);
>>  ssize_t      send(int, const void *, size_t, int);
>>  ssize_t      sendto(int, const void *,
>>           size_t, int, const struct sockaddr *, socklen_t);
>>  ssize_t      sendmsg(int, const struct msghdr *, int);
>> +ssize_t      sendmmsg(int, const struct mmsghdr *, unsigned int, int);
>Why did you put new functions into POSIX namespace ?
>They are GNU (and now BSD) extensions, at least I do not see then in SUS.
>
>>  #if __BSD_VISIBLE
>>  int  sendfile(int, int, off_t, size_t, struct sf_hdtr *, off_t *, int);
>>  int  setfib(int);

I agree. Guess I'll have to put it ifdef'd under __BSD_VISIBLE since it's
not POSIX.

>The patch lacks man page update, and lacks the freebsd32 compat shims.

I'll work on these later.

>What are the reasons to go with the trivial kernel implementation
>right now, instead of trivial and simpler libc wrapper ?  I think the
>speed would be same, but the code _much_ simpler.

My initial idea was such that with the kernel approach we get closer to the
driver itself and if it has mm optimizations we may have a benefit in
future.
And thus one day we won't need to redo the work scrubbing the libc-only
implementation.

Best regards,
Boris Astardzhiev

On Thu, Jan 7, 2016 at 12:56 PM, Luigi Rizzo <rizzo at iet.unipi.it> wrote:

> Hi Boris,
> thanks for working on this.
>
> A few comments:
>
> - do you have any performance data on calling *mmsg() versus
>   multiple invocations of the equivalent *msg() ?
>
> - in the following chunk in recvmmsg.c , there are slight
>   type differences between the function and the internal cast.
>   Same in sendmmsg.c
>
>     +ssize_t
>     +recvmmsg(int s, struct mmsghdr *msgvec, unsigned int vlen, int flags)
>     +{
>     +
>     +       return (((int (*)(int, struct mmsghdr *, int, int))
>     +           __libc_interposing[INTERPOS_recvmmsg])(s, msgvec, vlen,
> flags));
>     +}
>
>
> - why did you add a cap_rights_init() to the functions when
>   neither sendmsg or recvmsg have it (in other words, this is
>   a worthwhile addition but perhaps should be done later
>
> - the initial part of the two functions sys_*mmsg() is almost
>   exactly the same so can you define a function with the common code ?
>
> - you could probably avoid the repeated malloc/free of the iov
>   with a first loop that finds the max iov size and does the allocation
>   only once. copyiniov can then be replaced by a copyin
>
> - (minor) what is the point of copying the current entry into msg rather
> than
>   just using mp-> ...
>
> - can you add a comment explaining why you do the following
>
>                error = copyout(&td->td_retval[0], &uap->msgvec[i].msg_len,
>                    sizeof(td->td_retval[0]));
>
>
> cheers
> luigi
>
> On Thu, Jan 7, 2016 at 1:51 AM, Boris Astardzhiev
> <boris.astardzhiev at gmail.com> wrote:
> > Hello,
> >
> > Here's my implementation of the two system calls. The patch is against
> HEAD
> > from a couple of days:
> > commit ff9e83788d7ed52342dcba4dff1e62fdf3cc985c
> > Author: ngie <ngie at FreeBSD.org>
> > Date:   Mon Jan 4 03:34:22 2016 +0000
> >
> >     Remove free'ing of an uninitialized variable
> >
> >     Just remove it completely from the test as it's initialized but
> unused
> > apart
> >     from the free(3) call
> >
> >     Differential Revision: https://reviews.freebsd.org/D4769 (part of
> > larger diff)
> >     MFC after: 5 days
> >     Reported by: cppcheck
> >     Reviewed by: oshogbo
> >     Sponsored by: EMC / Isilon Storage Division
> >
> > I've made brief tests using the examples in the manpages in Linux
> skipping
> > the timeout stuff:
> > http://man7.org/linux/man-pages/man2/sendmmsg.2.html
> > http://man7.org/linux/man-pages/man2/recvmmsg.2.html
> >
> > I've tried to stick to the comments in the thread but any
> > suggestions/testings
> > are also welcomed so that I can fix the calls accordingly.
> >
> > Regards,
> > Boris Astardzhiev
> >
> >
> > On Mon, Jan 4, 2016 at 11:07 PM, Mark Delany <c2h at romeo.emu.st> wrote:
> >
> >> > You just repeat arguments for the text in my messages, which you
> removed
> >> > on reply.
> >>
> >> My goal is to get you to scruitinize.
> >>
> >> Thank you for helping.
> >>
> >>
> >> Mark.
> >> _______________________________________________
> >> freebsd-net at freebsd.org mailing list
> >> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> >> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
> >>
> >
> > _______________________________________________
> > freebsd-net at freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-net
> > To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>
>
>
> --
> -----------------------------------------+-------------------------------
>  Prof. Luigi RIZZO, rizzo at iet.unipi.it  . Dip. di Ing. dell'Informazione
>  http://www.iet.unipi.it/~luigi/        . Universita` di Pisa
>  TEL      +39-050-2217533               . via Diotisalvi 2
>  Mobile   +39-338-6809875               . 56122 PISA (Italy)
> -----------------------------------------+-------------------------------
>


More information about the freebsd-net mailing list