Date: Wed, 10 Nov 2021 02:57:47 UTC
On Tue, Nov 09, 2021 at 08:57:20PM -0500, Jan Schaumann via freebsd-net wrote: > Hello, > > I'm trying to wrap my head around the buffer sizes > relevant to AF_UNIX/PF_LOCAL dgram socketpairs. > > On a FreeBSD/amd64 13.0 system, creating a socketpair > and simply writing a single byte in a loop to the > non-blocking write end without reading the data, I can > perform 64 writes before causing EAGAIN, yielding 1088 > bytes in FIONREAD on the read end (indicating 16 bytes > per datagram overhead). When transmitting on a unix dgram socket, each message will include a copy of the sender's address, represented by a dummy 16-byte sockaddr in this case. This is stripped by the kernel when receiving, but still incurs overhead with respect to socket buffer accounting. > This is well below the total net.local.dgram.recvspace > = 4096 bytes. I would have expected to be able to > perform 240 1 byte writes (240 + 240*16 = 4080). > > Now if I try to write SO_SNDBUF = 2048 bytes on each > iteration (or subsequently as many as I can until > EAGAIN), then I can send one datagram with 2048 bytes > and one datagram with 2016 bytes, filling recvspace as > (2 * 16) + (2048 + 2016) = 4096. > > But at smaller sizes, it looks like the recvspace is > not filled completely: writes in chunks of > 803 bytes > will fill recvspace up to 4096 bytes, but below 803 > bytes, recvspace is not maxed out. > > Does anybody know why smaller datagrams can't fill > recvspace? Or what I'm missing / misunderstanding > about the recvspace here? There is an additional factor: wasted space. When writing data to a socket, the kernel buffers that data in mbufs. All mbufs have some amount of embedded storage, and the kernel accounts for that storage, whether or not it's used. With small byte datagrams there can be a lot of overhead; with stream sockets the problem is mitigated somewhat by compression, but for datagrams we don't have a smarter mechanism to maintain message boundaries. The kern.ipc.sockbuf_waste_factor sysctl controls the upper limit on total bytes (used or not) that may be enqueued in a socket buffer. The default value of 8 means that we'll waste up to 7 bytes per byte of data, I think. Setting it higher should let you enqueue more messages. As far as I know this limit can't be modified directly, it's a function of the waste factor and the socket buffer size.