From nobody Sat Feb 03 00:47:40 2024 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TRYsF0Zlxz58f7J; Sat, 3 Feb 2024 00:48:01 +0000 (UTC) (envelope-from gallatin@freebsd.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4TRYsF00qYz4KQ6; Sat, 3 Feb 2024 00:48:01 +0000 (UTC) (envelope-from gallatin@freebsd.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1706921281; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=RJhZ2pmFa8Mh9nqZ8Q6UxZA2Ks25X7DGKKB3ByQW1tU=; b=tLBNgZUZSmIZEcnVsMrdD7YyUvXmym8JRAE9Wq9DhTeNq52i+RBIdvjlJgX10oeixCtjyC UA6vw+i+8oAHIgnepD4q34xU3x97Gr+9M2/nOjXtZLyN1yu49gcKCzApdPGL4xgyYrmSrS 0iP9Ljwgr7vRcPmdcOTOZOVHHLJhOJhO/fsXyq+NQP+A+ka2P/oqBAO6LZfJ2ZXYJupiH/ ECOiaaOsb0qqbzP87lfxj98Y5jESU2CcT0AFAAo/4dVOUzACIkejW8UjjGp619kJn4/ib1 NjOlLxXoaYz5uTWdcmg3sEdOrq0g+mrat4uwJ/X/FFwzE9M9Q3//mnSKCaxDqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1706921281; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=RJhZ2pmFa8Mh9nqZ8Q6UxZA2Ks25X7DGKKB3ByQW1tU=; b=MiPYRNOKuSgyPWhacpL48sAhBf1uopCHaUvUjIaloa7D9ffju2GwMHL/1idwFy4Y8eykQF RWLsOs9rj94HcSqdx0etEHYiSipeWorOLIJheH/yT0S6Fkq8k1rhCVIvN2mXN7oR4clAe1 kh0+tQfA7UD9KfPBVQuhny1WdX7oR5wJ5mnVcv3nRd1OW/P+8W0hO4c48+j+lxK0J55lfw x0R5IyigII7HY/wUJyqnWrvsmbV7oJm5EiRwqD7ykhWUEjARHNQMM23PPPhg5JcXvTskVf 2krL8wZ8tQ6PJ69VV7M2Wj4YuELqgv+a2AalPH0Vqyf3bO557pAFCTS/V06Qig== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1706921281; a=rsa-sha256; cv=none; b=cepMKxf3i6raG/McEVMYvW7FMn6AE5aVW6ogWtKXjQ1aTPKZKA11aG9/PuzITyOvNNYng3 gYyVFDBGUNN2QKKravf/vepUclSlRiY54n38tba423Ld7knnZ1upZ+g435sSx8TKT4SkRP QlkBxWR9hYS/MeZwP00jS4TvPO8ge7GTCM01sMIYDHOW9URxx3N14qfXA0tR6VC5lfoKeL vJeOAnulYm25jGj4ZgGcmW4zWPTdYBunJmhpxP1wbrl0vjWSn9hRLWxqBdC9F5cjrkIS+b wNX1mlctNUadLHVt+eCvY5s/QUCZlZtiwG64Uo+70QHM55cZVO1LJPCG7QV5nA== Received: from auth2-smtp.messagingengine.com (auth2-smtp.messagingengine.com [66.111.4.228]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) (Authenticated sender: gallatin) by smtp.freebsd.org (Postfix) with ESMTPSA id 4TRYsD5nLyz14jZ; Sat, 3 Feb 2024 00:48:00 +0000 (UTC) (envelope-from gallatin@freebsd.org) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailauth.nyi.internal (Postfix) with ESMTP id A5FDA27C005B; Fri, 2 Feb 2024 19:48:00 -0500 (EST) Received: from imap53 ([10.202.2.103]) by compute5.internal (MEProxy); Fri, 02 Feb 2024 19:48:00 -0500 X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvkedrfeduhedgvdeiucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepofgfggfkjghffffhvfevufgtsegrtderreerredtnecuhfhrohhmpedfffhr vgifucfirghllhgrthhinhdfuceoghgrlhhlrghtihhnsehfrhgvvggsshgurdhorhhgqe enucggtffrrghtthgvrhhnpeeggfeugeevuedtuedvleefffduteegtdffudeihefhgfeg feekffeiueevkeeuudenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrih hlfhhrohhmpehgrghllhgrthhinhdomhgvshhmthhprghuthhhphgvrhhsohhnrghlihht hidqudeffeehledvvdduiedqvdelhedtgedukeegqdhgrghllhgrthhinheppehfrhgvvg gsshgurdhorhhgsehfrghsthhmrghilhdrtghomh X-ME-Proxy: Feedback-ID: i41414658:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id 68C46364006B; Fri, 2 Feb 2024 19:48:00 -0500 (EST) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.11.0-alpha0-144-ge5821d614e-fm-20240125.002-ge5821d61 List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@freebsd.org MIME-Version: 1.0 Message-Id: In-Reply-To: References: <2c31ac44-b34b-469c-a6de-fdd927ec2f9e@freebsd.org> Date: Fri, 02 Feb 2024 19:47:40 -0500 From: "Drew Gallatin" To: "Rick Macklem" , "Richard Scheffenegger" Cc: "freebsd-net@FreeBSD.org" , "FreeBSD Transport" , rmacklem@freebsd.org, kp@freebsd.org Subject: Re: Increasing TCP TSO size support Content-Type: multipart/alternative; boundary=d72aaec284da4bab8e1160d4085e3fc4 --d72aaec284da4bab8e1160d4085e3fc4 Content-Type: text/plain On Fri, Feb 2, 2024, at 6:13 PM, Rick Macklem wrote: > A factor here is the if_hw_tsomaxsegcount limit. For example, a 1Mbyte NFS write request > or read reply will result in a 514 element mbuf chain. Each of these (mostly 2K mbuf clusters) > are non-contiguous data segments. (I suspect most NICs do not handle this many segments well, > if at all.) Excellent point > > The NFS code does know how to use M_EXTPG mbufs (for NFS over TLS, for the ktls), but I do not > know what it would take to make these work for non-KTLS TSO? Sendfile already uses M_EXTPG mbufs... When I was initially doing M_EXTPG stuff for kTLS, I added support for using M_EXTPG mbufs in sendfile regardless of whether or not kTLS was in use. That reduced CPU use marginally on 64-bit platforms (due to reducing socket buffer lengths, and hence reducing pointer chasing), and quite a bit more on 32-bit platforms (due to also not needing to map memory into the kernel map, and by reducing pointer chasing even more, as more pages fit into an M_EXTPG mbuf when a paddr_t is 32-bits. > I do not know how the TSO loop in tcp_output handles M_EXTPG mbufs. > Does it assume each M_EXTPG mbuf is one contiguous data segment? No, its fully aware of how to handle M_EXTPG mbufs. Look at tcp_m_copy(). We added code in the segment counting part of that function to count the hdr/trailer parts of an M_EXTPG mbuf, and to deal with the start/end page being misaligned. > I do see that ip_output() will call mb_unmapped_to_ext() when the NIC does not have IFCAP_MEXTPG set. > (If IFCAP_MEXTPG is set, do the pages need to be contiguous so that it can become > a single contiguous data segment for TSO or ???) No, it just means that a NIC driver has been verified to call not mtod() an M_EXTPGS mbuf and deref the resulting data pointer. (which would make it go "boom"). But the page size is only 4K on most platforms. So while an M_EXTPGS mbuf can hold 5 pages (..from memory, too lazy to do the math right now) and reduces socket buffer mbuf chain lengths by a factor of 10 or so (2k vs 20k per mbuf), the S/G list that a NIC will need to consume would likely decrease only by a factor of 2. And even then only if the busdma code to map mbufs for DMA is not coalescing adjacent mbufs. I know busdma does some coalescing, but I can't recall if it coalesces physcally adjacent mbufs. > If TSO and the code beneath it (NIC and maybe mb_unmapped_to_ext() being called) were to > all work ok for M_EXTPG mbufs, it would be easy to enable that for NFS (non-TLS case). It does. You should enable it for at least TCP. Drew --d72aaec284da4bab8e1160d4085e3fc4 Content-Type: text/html Content-Transfer-Encoding: quoted-printable

=
On Fri, Feb 2, 2024, at 6:13 PM, Rick Macklem wrote:
<= /div>
 A factor here is the if_hw_tsomaxse= gcount limit. For example, a 1Mbyte NFS write request
or read r= eply will result in a 514 element mbuf chain. Each of these (mostly 2K m= buf clusters)
are non-contiguous data segments. (I suspect most NICs d= o not handle this many segments well,
if at all.)

Excellent point
=

The NFS code does know how to= use M_EXTPG mbufs (for NFS over TLS, for the ktls), but I do not
know= what it would take to make these work for non-KTLS TSO?
=


Sendfile alr= eady uses M_EXTPG mbufs... When I was initially doing M_EXTPG stuff for = kTLS, I added support for using M_EXTPG mbufs in sendfile regardless of = whether or not kTLS was in use.  That reduced CPU use marginally on= 64-bit platforms (due to reducing socket buffer lengths, and hence redu= cing pointer chasing), and quite a bit more on 32-bit platforms (due to = also not needing to map memory into the kernel map, and by reducing poin= ter chasing even more, as more pages fit into an M_EXTPG mbuf when a pad= dr_t is 32-bits.


I do not know how the TSO loop in tcp_output handles M_E= XTPG mbufs.
Does it assume each M_EXTPG mbuf is one contiguous data se= gment?

No, i= ts fully aware of how to handle M_EXTPG mbufs.  Look at tcp_m_copy(= ).  We added code in the segment counting part of that function to = count the hdr/trailer parts of an M_EXTPG mbuf, and to deal with the sta= rt/end page being misaligned.

I do see that ip_output() will call mb_unmapped_to_ext() wh= en the NIC does not have IFCAP_MEXTPG set.
(If IFCAP_MEXTPG is set, do= the pages need to be contiguous so that it can become
a single contig= uous data segment for TSO or ???)

No, it just means that a NIC driver has been verif= ied to call not mtod() an M_EXTPGS mbuf and deref the resulting data poi= nter. (which would make it go "boom").

But = the page size is only 4K on most platforms.  So while an M_EXTPGS m= buf can hold 5 pages (..from memory, too lazy to do the math right now) = and reduces socket buffer mbuf chain lengths by a factor of 10 or so (2k= vs 20k per mbuf), the S/G list that a NIC will need to consume would li= kely decrease only by a factor of 2.  And even then only if the bus= dma code to map mbufs for DMA is not coalescing adjacent mbufs.  I = know busdma does some coalescing, but I can't recall if it coalesces phy= scally adjacent mbufs. 

If TSO and the code beneath it (NIC and maybe mb_unmapped_t= o_ext() being called) were to
all work ok for M_EXTPG mbufs, it would = be easy to enable that for NFS (non-TLS case).


It does.  You sho= uld enable it for at least TCP.

Drew
--d72aaec284da4bab8e1160d4085e3fc4--