Re: git: d15792780760 - main - unix: new implementation of unix/stream & unix/seqpacket

From: Cy Schubert <Cy.Schubert_at_cschubert.com>
Date: Tue, 06 May 2025 15:06:54 UTC
In message <202505051956.545JuOPR085707@gitrepo.freebsd.org>, Gleb Smirnoff 
wri
tes:
> The branch main has been updated by glebius:
>
> URL: https://cgit.FreeBSD.org/src/commit/?id=d15792780760ef94647af9b377b5f0a8
> 0e1826bc
>
> commit d15792780760ef94647af9b377b5f0a80e1826bc
> Author:     Gleb Smirnoff <glebius@FreeBSD.org>
> AuthorDate: 2025-05-05 19:56:04 +0000
> Commit:     Gleb Smirnoff <glebius@FreeBSD.org>
> CommitDate: 2025-05-05 19:56:04 +0000
>
>     unix: new implementation of unix/stream & unix/seqpacket
>     
>     [this is an updated version of d80a97def9a1, that had been reverted]
>     
>     Provide protocol specific pr_sosend and pr_soreceive for PF_UNIX
>     SOCK_STREAM sockets and implement SOCK_SEQPACKET sockets as an extension
>     of SOCK_STREAM.  The change meets three goals: get rid of unix(4) specifi
> c
>     stuff in the generic socket code, provide a faster and robust unix/stream
>     sockets and bring unix/seqpacket much closer to specification.  Highlight
> s
>     follow:
>     
>     - The send buffer now is truly bypassed.  Previously it was always empty,
>     but the send(2) still needed to acquire its lock and do a variety of
>     tricks to be woken up in the right time while sleeping on it.  Now the
>     only two things we care about in the send buffer is the I/O sx(9) lock
>     that serializes operations and value of so_snd.sb_hiwat, which we can rea
> d
>     without obtaining a lock.  The sleep of a send(2) happens on the mutex of
>     the receive buffer of the peer.  A bulk send/recv of data with large
>     socket buffers will make both syscalls just bounce between owning the
>     receive buffer lock and copyin(9)/copyout(9), no other locks would be
>     involved.  Since event notification mechanisms, such as select(2), poll(2
> )
>     and kevent(2) use state of the send buffer to monitor writability, the ne
> w
>     implementation provides protocol specific pr_sopoll and pr_kqfilter.  The
>     sendfile(2) over unix/stream is preserved, providing protocol specific
>     pr_send and pr_sendfile_wait methods.
>     
>     - The implementation uses new mchain structure to manipulate mbuf chains.
>     Note that this required converting to mchain two functions that are share
> d
>     with unix/dgram: unp_internalize() and unp_addsockcred() as well as addin
> g
>     a new shared one uipc_process_kernel_mbuf().  This induces some non-
>     functional changes in the unix/dgram code as well.  There is a space for
>     improvement here, as right now it is a mix of mchain and manually managed
>     mbuf chains.
>     
>     - unix/seqpacket previously marked as PR_ADDR & PR_ATOMIC and thus treate
> d
>     as a datagram socket by the generic socket code, now becomes a true strea
> m
>     socket with record markers.
>     
>     - Note on aio(4).  First problem with socket aio(4) is that it uses socke
> t
>     buffer locks for queueing and piggybacking on this locking it calls
>     soreadable() and sowriteable() directly.  Ideally it should use
>     pr_sopoll() method.  Second problem is that unlike a syscall, aio(4) want
> s
>     a consistent uio structure upon return.  This is incompatible with our
>     speculative read optimization, so in case of aio(4) write we need to
>     restore consistency of uio.   At this point we workaround those problems
>     on the side of unix(4), but ideally those workarounds should be socket
>     aio(4) problem (not a first class citizen) rather than problem of unix(4)
> ,
>     definitely a primary facility.
> ---
>  sys/kern/uipc_usrreq.c | 1722 +++++++++++++++++++++++++++++++++++++---------
> --
>  sys/sys/sockbuf.h      |   12 +
>  2 files changed, 1343 insertions(+), 391 deletions(-)
>

Is anyone else experiencing rsync hangs with this? I'm noticing rsync hangs 
with large files, like those in /usr/obj or a poudriere package repo.

I haven't reverted this locally yet, just guessing ATM.


-- 
Cheers,
Cy Schubert <Cy.Schubert@cschubert.com>
FreeBSD UNIX:  <cy@FreeBSD.org>   Web:  https://FreeBSD.org
NTP:           <cy@nwtime.org>    Web:  https://nwtime.org

			e^(i*pi)+1=0