From nobody Wed Nov 10 15:53:29 2021 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4E02E18518BF for ; Wed, 10 Nov 2021 15:53:38 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-qk1-x736.google.com (mail-qk1-x736.google.com [IPv6:2607:f8b0:4864:20::736]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Hq8XL0qlMz4ql4 for ; Wed, 10 Nov 2021 15:53:38 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-qk1-x736.google.com with SMTP id 193so2865440qkh.10 for ; Wed, 10 Nov 2021 07:53:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=v8wcaJRGfoeGPSdK+V8uzDZz8lYuZmbe9SslUbv+6ag=; b=HulukrJH4EnOV5N3f9hFZGe4sHW5TJ/iEMuNyK51YYZZa7Y11aziA2JTkFMbPFIa0H j6sg7e8BPQ35W5CMPJBO+7HgklLHVRu9WaCrZSzXG3ZTzXzCRjtbg/9cJBo0wmlksT8/ B5s1w4Wq16bfKaMm/p/P6jJSJ3QLNSbaL7k1C2eBT0KHN/37IhNv7kj0jmPOVe5kLv3o 9rjgI93rNKe3bkcOAbQVKFHMdsSbFwszHldzZV66xTBYYCsyv1hFi+7VJWrZYBVmHuP/ Z+PTkwgXypIwLE0+H8J4PXvurDsernXRnUmZqFoLKPZ3ryjqUjism2RNN0aB8fOOCIfy Qe6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=v8wcaJRGfoeGPSdK+V8uzDZz8lYuZmbe9SslUbv+6ag=; b=mW1DuBBhilHlkfafeJAmbJqJjORqWbO9WtVzP3j9ETvFiejwe/K785hXpcUHZtyFL1 1AD5rH2ItQFgpWzHdRq55ox+RROlBWWmkmjdOp8UNGjdJcGXE4DraxTWXdcUCoFfLK1J nQZkInmhWGb0a9d3D23OV323GmjeA8+cEexO9cw1gFdziwnw3yOHxIiYWrddq6P1QOWC nFmQgzHMaqSMPz8OXXf8TVw1WkLvO1yCT8D0BfiJRB7qDpoM125FqcMXK4cUbdpDMrl6 dxZUW7wwbOFENy4BHBd92s2fYFS1TF/PhWR4SnGkanHQody4swl+3ZOgh0yjX1MEIelv zxog== X-Gm-Message-State: AOAM530mFfR1YkWKCztO3xoz6+Rd0xLMZnneqD/+OkgB6PK6Sy3uHilW 69RmrHpercAmerEI3Yg/72qRP++bM3U= X-Google-Smtp-Source: ABdhPJwE9Fc7y7Li+OYkPyMA/Yo7uhIULyOAio6EDgt2t3/+9wHbn57JBpmyoNS42LU37pm53UuJ2w== X-Received: by 2002:a37:96c2:: with SMTP id y185mr295191qkd.6.1636559611621; Wed, 10 Nov 2021 07:53:31 -0800 (PST) Received: from nuc ([142.126.186.191]) by smtp.gmail.com with ESMTPSA id k85sm79937qke.134.2021.11.10.07.53.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Nov 2021 07:53:31 -0800 (PST) Date: Wed, 10 Nov 2021 10:53:29 -0500 From: Mark Johnston To: jschauma@netmeister.org Cc: freebsd-net@freebsd.org Subject: Re: AF_UNIX socketpair dgram queue sizes Message-ID: References: <20211110015719.GY3553@netmeister.org> <20211110050533.GA11277@netmeister.org> List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20211110050533.GA11277@netmeister.org> X-Rspamd-Queue-Id: 4Hq8XL0qlMz4ql4 X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N On Wed, Nov 10, 2021 at 12:05:33AM -0500, Jan Schaumann via freebsd-net wrote: > Mark Johnston wrote: > > > There is an additional factor: wasted space. When writing data to a > > socket, the kernel buffers that data in mbufs. All mbufs have some > > amount of embedded storage, and the kernel accounts for that storage, > > whether or not it's used. With small byte datagrams there can be a lot > > of overhead; > > I'm observing two mbufs being allocated for each > datagram for small datagrams, but only one mbuf for > larger datagrams. > > That seems counter-intuitive to me? From my reading, sbappendaddr_locked_internal() will always allocate an extra mbuf for the address, so I can't explain this. What's the threshold for "larger"? How are you counting mbuf allocations? > > The kern.ipc.sockbuf_waste_factor sysctl controls the upper limit on > > total bytes (used or not) that may be enqueued in a socket buffer. The > > default value of 8 means that we'll waste up to 7 bytes per byte of > > data, I think. Setting it higher should let you enqueue more messages. > > Ah, this looks like something relevant. > > Setting kern.ipc.sockbuf_waste_factor=1, I can only > write 8 1-byte datagrams. For any increase of the > waste factor by one, I get another 8 1-byte datagrams, > up until waste factor > 29, at which point we hit > recvspace: 30 * 8 = 240, so 240 1-byte datagrams with > 16 bytes dgram overhead means we get 240*17 = 4080 > bytes, which just fits (well, with room for one empty > 16-byte dgram) into the recvspace = 4096. > > But I still don't get the direct relationship between > the waste factor and the recvspace / buffer queue: > with a waste_factor of 1 and a datagram with 1972 > bytes, I'm able to write one dgram with 1972 bytes + > 1 dgram with 1520 bytes = 3492 bytes (plus 2 * 16 > bytes overhead = 3524 bytes). There'd still have been > space for 572 more bytes in the second dgram. For a datagram of size 1972, we'll allocate one mbuf (size 256 bytes) and one mbuf "cluster" (2048 bytes), and then a second 256 byte mbuf for the address. So sb_mbcnt will be 2560 bytes, leaving 1536 bytes of space for a second datagram. > Liekwise, trying to write a single 1973 dgram fills > the queue and no additional bytes can be written in a > second dgram, but I can write a single 2048 byte > dgram. I suspect that this bit of the unix socket code might be related: https://cgit.freebsd.org/src/tree/sys/kern/uipc_usrreq.c#n1144 Here we get the amount of space available in the recv buffer (sbcc) and compare it with the data limit in the _send_ buffer to determine whether to apply backpressure. You wrote "SO_SNDBUF = 2048" in your first email, and if that's the case here then writing ~2000 bytes would cause the limit to be hit. I'm not sure why 1973 is the magic value here. > Still confused...