Re: git: d15792780760 - main - unix: new implementation of unix/stream & unix/seqpacket
Date: Tue, 06 May 2025 15:06:54 UTC
In message <202505051956.545JuOPR085707@gitrepo.freebsd.org>, Gleb Smirnoff wri tes: > The branch main has been updated by glebius: > > URL: https://cgit.FreeBSD.org/src/commit/?id=d15792780760ef94647af9b377b5f0a8 > 0e1826bc > > commit d15792780760ef94647af9b377b5f0a80e1826bc > Author: Gleb Smirnoff <glebius@FreeBSD.org> > AuthorDate: 2025-05-05 19:56:04 +0000 > Commit: Gleb Smirnoff <glebius@FreeBSD.org> > CommitDate: 2025-05-05 19:56:04 +0000 > > unix: new implementation of unix/stream & unix/seqpacket > > [this is an updated version of d80a97def9a1, that had been reverted] > > Provide protocol specific pr_sosend and pr_soreceive for PF_UNIX > SOCK_STREAM sockets and implement SOCK_SEQPACKET sockets as an extension > of SOCK_STREAM. The change meets three goals: get rid of unix(4) specifi > c > stuff in the generic socket code, provide a faster and robust unix/stream > sockets and bring unix/seqpacket much closer to specification. Highlight > s > follow: > > - The send buffer now is truly bypassed. Previously it was always empty, > but the send(2) still needed to acquire its lock and do a variety of > tricks to be woken up in the right time while sleeping on it. Now the > only two things we care about in the send buffer is the I/O sx(9) lock > that serializes operations and value of so_snd.sb_hiwat, which we can rea > d > without obtaining a lock. The sleep of a send(2) happens on the mutex of > the receive buffer of the peer. A bulk send/recv of data with large > socket buffers will make both syscalls just bounce between owning the > receive buffer lock and copyin(9)/copyout(9), no other locks would be > involved. Since event notification mechanisms, such as select(2), poll(2 > ) > and kevent(2) use state of the send buffer to monitor writability, the ne > w > implementation provides protocol specific pr_sopoll and pr_kqfilter. The > sendfile(2) over unix/stream is preserved, providing protocol specific > pr_send and pr_sendfile_wait methods. > > - The implementation uses new mchain structure to manipulate mbuf chains. > Note that this required converting to mchain two functions that are share > d > with unix/dgram: unp_internalize() and unp_addsockcred() as well as addin > g > a new shared one uipc_process_kernel_mbuf(). This induces some non- > functional changes in the unix/dgram code as well. There is a space for > improvement here, as right now it is a mix of mchain and manually managed > mbuf chains. > > - unix/seqpacket previously marked as PR_ADDR & PR_ATOMIC and thus treate > d > as a datagram socket by the generic socket code, now becomes a true strea > m > socket with record markers. > > - Note on aio(4). First problem with socket aio(4) is that it uses socke > t > buffer locks for queueing and piggybacking on this locking it calls > soreadable() and sowriteable() directly. Ideally it should use > pr_sopoll() method. Second problem is that unlike a syscall, aio(4) want > s > a consistent uio structure upon return. This is incompatible with our > speculative read optimization, so in case of aio(4) write we need to > restore consistency of uio. At this point we workaround those problems > on the side of unix(4), but ideally those workarounds should be socket > aio(4) problem (not a first class citizen) rather than problem of unix(4) > , > definitely a primary facility. > --- > sys/kern/uipc_usrreq.c | 1722 +++++++++++++++++++++++++++++++++++++--------- > -- > sys/sys/sockbuf.h | 12 + > 2 files changed, 1343 insertions(+), 391 deletions(-) > Is anyone else experiencing rsync hangs with this? I'm noticing rsync hangs with large files, like those in /usr/obj or a poudriere package repo. I haven't reverted this locally yet, just guessing ATM. -- Cheers, Cy Schubert <Cy.Schubert@cschubert.com> FreeBSD UNIX: <cy@FreeBSD.org> Web: https://FreeBSD.org NTP: <cy@nwtime.org> Web: https://nwtime.org e^(i*pi)+1=0