From nobody Thu Jan 13 07:16:09 2022 X-Original-To: net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 53B8E1944D8E; Thu, 13 Jan 2022 07:16:25 +0000 (UTC) (envelope-from bsd-lists@bsdforge.com) Received: from udns.ultimatedns.net (static-24-113-41-81.wavecable.com [24.113.41.81]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "ultimatedns.net", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JZG21048bz4fxv; Thu, 13 Jan 2022 07:16:24 +0000 (UTC) (envelope-from bsd-lists@bsdforge.com) Received: from ultimatedns.net (localhost [127.0.0.1]) by udns.ultimatedns.net (8.16.1/8.16.1) with ESMTP id 20D7GANI031495; Wed, 12 Jan 2022 23:16:16 -0800 (PST) (envelope-from bsd-lists@bsdforge.com) List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@freebsd.org MIME-Version: 1.0 Date: Wed, 12 Jan 2022 23:16:09 -0800 From: Chris To: Gleb Smirnoff Cc: net@freebsd.org, current@freebsd.org Subject: Re: compressed TIME-WAIT to be decomissioned In-Reply-To: References: User-Agent: UDNSMS/17.0 Message-ID: X-Sender: bsd-lists@bsdforge.com Content-Type: multipart/mixed; boundary="=_4ae67900a837bea3c4e6e4d25307e30b" X-Rspamd-Queue-Id: 4JZG21048bz4fxv X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N --=_4ae67900a837bea3c4e6e4d25307e30b Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII; format=flowed On 2022-01-12 10:48, Gleb Smirnoff wrote: > Hi! Thanks for the informative writeup, Gleb! Untrimmed, sorry... > > [crossposted to current@, but let's keep discussion at net@] > > I have already touched the topic with rrs@, jtl@, tuexen@, rscheff@ and > Igor Sysoev (author of nginx). Now posting for wider discussion. > > TLDR: struct tcptw shall be decomissioned > > Longer version covers three topics: why does tcptw exist? why is it no > longer necessary? what would we get removing it? > > Why does struct tcptw exist? > > When TCP connection goes to TIME-WAIT state, it can only retransmit > the very last ACK, thus doesn't need all of the control data in the kernel. > However, we are required to keep it in memory for certain amount of time > (2*MSL). So, let's save memory: free the socket, free the tcpcb and > leave only inpcb that will point at small tcptw (much smaller than tcpcb) > that holds enough info to retransmit the last ACK. This was done in > early 2003, see 340c35de6a2. > > What was different in 2003 compared to 2022? > > * First of all, internet servers were running i386 with only 2 Gb of KVA > space. Unlike today, they were memory constrained in the first place, not > CPU bound like they are today. > > * Many of HTTP connections were made by older browsers, which were not able > to use persistent HTTP connections. Those browsers that could, would > recycle connections more often, then today. Default timeouts in Apache > for persistent connections were short. So, the ratio of connections > in TIME-WAIT compared to live connections was much bigger than today. > Here is sample data from 2008 provided to me by Igor Sysoev: > > ITEM SIZE LIMIT USED FREE REQUESTS FAILURES > tcpcb: 728, 163840, 22938, 72722, 13029632, 0 > tcptw: 88, 163842, 10253, 72949, 2447928, 0 > > We see that TIME-WAITs are ~ 50% of live connections. > > Today I see that TIME-WAITs are ~ 1% of connections. My data is biased > here, since I'm looking at servers that do mostly video streaming. I'd > be grateful if anybody replies to this email with some other modern data > on ratio between tcpcb and tcptw allocations. > > * The Internet bandwidth was lower and thus average size of HTTP object > much smaller. That made the average send socket buffer size much smaller > than today. Note that TCP socket buffers autosizing came in 2009 only. > This means that today most significant portion of kernel memory consumed > by an average TCP connection is the send socket buffer, and > socket+inpcb+tcpcb is just a fraction of that. Thus, swapping tcpcb to > tcptw we are saving a fraction of a fraction of memory consumed by average > connection. > > * Who told that 2*MSL (60 seconds) is adequate time to keep TIME-WAIT? > In 71d2d5adfe1 I added some stats on usage of tcptw and experimented a bit > with lowering net.inet.tcp.msl. It appeared that lowering it down three > times doesn't have statistically significant effect on TIME-WAIT use > stats. > This means that the already miniscule number of TIME-WAIT connection on a > modern HTTP server can be lowered 3 times more. Feel free to lower > net.inet.tcp.msl and do your own measurements with > 'netstat -sp tcp | grep TIME-WAIT'. I'd be glad to see your results. I think that should be: 'netstat -sp tcp | grep TIME_WAIT' fe; on the system I'm writing this from: up 15:19, coffee# netstat -sp tcp | grep TIME_WAIT 5 connections in TIME_WAIT state > > Ok, now what would removal give us? > > * One less alloc/free during socket lifetime (immediately). > * Reduced code complexity. inp->inp_ppcb always can be dereferenced as > tcpcb. > Lot's of checking for inp->inp_flags & INP_TIMEWAIT goes away > (eventually). > * Shrink of struct inpcb. Today inpcb has some TCP-only data, e.g. HPTS. > Reason for that is obvious - compressed TIME-WAIT. A HPTS-driven > connection > may transition to TIME-WAIT, so we can't use tcpcb. Now we would be able > to. > So, for non TCP connections memory footprint shrinks (with following > changes). > * Embedding inpcb into protocols cb. An inpcb becomes one piece of memory > with > tcpcb. One more less alloc/free during socket lifetime. Reduced code > complexity, since now inpcb == tcpb (following changes). > > How much memory are we going to lose? > > (kgdb) p tcpcb_zone->uz_keg->uk_rsize > $5 = 1064 > (kgdb) p tcptw_zone->uz_keg->uk_rsize > $6 = 72 > (kgdb) p tcpcbstor->ips_zone->uz_keg->uk_rsize > $8 = 424 > > After change a connection in TIME-WAIT would consume 424+1064 bytes instead > of 424+72. Multiply that by expected number of connections in TIME-WAIT on > your machine. > > Comments welcome. --=_4ae67900a837bea3c4e6e4d25307e30b Content-Transfer-Encoding: 7bit Content-Type: application/pgp-keys; name=0xBDE49540.asc Content-Disposition: attachment; filename=0xBDE49540.asc; size=5028 -----BEGIN PGP PUBLIC KEY BLOCK----- mQENBGDTzGEBCADHlXdS4V57s2soaEK2wi3o9rr9zo7to/giBSxCpFYJxOnPkL5A 2ibbvflrL8sWvAczx47wgDS7iIhzICBBRdnXtcFGnoeeriV27LSn+PcpnIB+DaWZ xe+6TDC0Z0JUJ7qDTjUBFzhnQGYlrVvc4WbnWTjJaB1LEwgIX8JqX5S3SX0/oXgs +OtqDuENZ4/a5te5xPnspTv/5NJHjqYGxjHP0Vw0KjRKS1AoJ1SBPSMQV5373AX9 5NzFS+CjqeQhjfHFPeRajQ8t4T6eqhKA7LtKMO1egeAwNehk9ZoEqEBT2+ojuKUd oSuzqvhhx+eUIYLFqoPSzMKR+YbStzergsbnABEBAAG0KUNocmlzIEh1dGNoaW5z b24gPGNocmlzaEB1bHRpbWF0ZWRucy5uZXQ+iQFrBBABCABVBgsJBwgDAgQVCAoC AxYCAQIZAQIbAwIeARgYaGtwczovL2tleXMub3BlbnBncC5vcmcWIQQGJAsyyBlk cuwsSYsYdR58veSVQAUCYNQl+wUJA8LAmgAKCRAYdR58veSVQN3NB/sFTeXrZeDk ml/dshET8QbkOPgXlnibk8+Mauf+y9LjS9WT7R8EmqhK7T7aw115JQ1RWTM6kpQM jyDBjYF7piJEpNKI9YDeSnODKir1fWQqm9+wd68wAKGvV4m8kg9uOHCvXG4J++MG zDFH+PuGVxKirFnaz46DpS0Zw7wTtjNiNFvCooYov3IeYGfqcchd3hwBuXgWLexZ vI8JW7lL9oXl7B/wcbSxg9rwy6/QLYGg6sEtYRcFYyvQWefSMJaLWjU/pZN2iSxM lXm55iZv1BXHupfeD1ldRiGs6ejrcpa8+U1ju291WbLzcIsU8IDljeW9/WB2dLFT hJmY1wRk158AtB5DaHJpcyA8YnNkLWxpc3RzQGJzZGZvcmdlLmNvbT6JAWgEEAEI AFIGCwkHCAMCBBUICgIDFgIBAhsDAh4BGBhoa3BzOi8va2V5cy5vcGVucGdwLm9y ZxYhBAYkCzLIGWRy7CxJixh1Hny95JVABQJg1CX7BQkDwsCaAAoJEBh1Hny95JVA aI0H/AlJAOfc5TcMKa479Itw31mwccKb+u0DPN9Gkm/RfWIBjeqqozxCM8G8jVFr dt/J6KmBO3dQtRZHlXdD57RAfDDl5Vm3uws0s+UIFOxMiua/YxyuDcKLsE8Bjkzx z+vuJ8f6cg4WlygPr3bo3l81AOuU/wOsTrNkQvVJxgATlooATSVxs0yNn2uoso9f nhMGUYsmT4c35JYh0k6Lq7Z2LS+ELipMTQ7M7iCWSP1O/zSEvPD4NBo52xCvjLka KcL4fRl7UN+6ouwGr5aUn83tztE/IR0AK45gFvL5yxI4g/zm1t3j2+hhhW1pBU8w uQWkD2DyLTWy7xs1uVF5m1ojHp60H0NocmlzIDxrbm90QHRhY29tYXdpcmVsZXNz Lm5ldD6JAWgEEAEIAFIGCwkHCAMCBBUICgIDFgIBAhsDAh4BGBhoa3BzOi8va2V5 cy5vcGVucGdwLm9yZxYhBAYkCzLIGWRy7CxJixh1Hny95JVABQJg1CX7BQkDwsCa AAoJEBh1Hny95JVA5m8H/iENaTD4j5QHfaHfiDIdxGx36GnETyRK0vAzr2b6pzG+ 7VHNCm4ZfuMsXDJ1ZD8fjTipvg0f4w31xCQI0NgNdAqudBqE075Jwcr9pE9j8VN1 Nvejto01cgLHODbLPhokrkFz1K023VjCdy5RaVuCZ6ajTif7Kq+BEOE8TumYx4ly zdhnh/9ICohqfVvEMh347wI36D7HuezHB773hOsHdqTy9T+0Qu0Vu+wud45MUy1f vRF11OkJFtKL0bh4yMSGVY1xte1Mt/qC6rd43TDtAW3ekw1o/exh764kp7XXQsmP wwe4Y040PZafcygJlEW9bBtjjxKnzDTvqeb5dMi6d7a0GENocmlzIDxvaWRldkBz dW5vcy5pbmZvPokBaAQQAQgAUgYLCQcIAwIEFQgKAgMWAgECGwMCHgEYGGhrcHM6 Ly9rZXlzLm9wZW5wZ3Aub3JnFiEEBiQLMsgZZHLsLEmLGHUefL3klUAFAmDUJfsF CQPCwJoACgkQGHUefL3klUB74wf8DSvT36bYZp7oqZ+35HNhTekJ2dbTzUhauF0S +Z9R1AGnNnINgua75CyQGdNCIgcZxo4qG9sePl7SllQ9i0qhmiw0mzmvky8bAZQV V/2Coc1C/81b+PI19VczYrbZC20jApsnbAIkKZgSh9XQoiLd3meY7G2lX2k6CXYL xSeBEh+N3BU8vLxExm82U71Qzm43u0kA1TlbTSqpBvg/tfAzTCsYQLSlB6b4ZL2W D6U7b7ZYF5oZNonVNWSHxpjUN3Evkta9xWS2+cgYQdlP1/ku5w5ZWwzmYG7awh0J /YuSNIp6Ks6D/PSBduu6XbH+FJHaXmq+ZCKpNBh5EKH+GhOfq7QfQ2hyaXMgPHBv cnRtYXN0ZXJAYnNkZm9yZ2UuY29tPokBaAQQAQgAUgYLCQcIAwIEFQgKAgMWAgEC GwMCHgEYGGhrcHM6Ly9rZXlzLm9wZW5wZ3Aub3JnFiEEBiQLMsgZZHLsLEmLGHUe fL3klUAFAmDUJfwFCQPCwJoACgkQGHUefL3klUC3GggAo4Y+hslaoV7Namp7qWYZ Vei4ZwPfsYW7/HtmFORSGV8C8xR+LSkwzN1Hc7Qxvwv+DXuk7Hzd1Ag/xe8XhbNG /NMrXENY/8ym9TRbxtrBIhQyhkyShSUT+N+g16GRNZKuNL2MOIHc/RCS/YyyaTtu TzIxFbP7Gb2LO1LiiZsFVOGirHfxyiww7CAm3HXY2K4smOiKs6swZMpStVy3dd6A BcB1LPGs3ywDglFfKCRbVmjsPgsi61r4kUBVO6ML7lAmPDXLXOa+7iAtBN479QxC MVeH3Y3SMrvu61Vyf1xL79rIznU3u8C34zfxqsoIV0zCZe2YDLbFfLhZYqatYYEo e7QjImNocmlzLmgiIDxjaHJpcy5oQHVsdGltYXRlZG5zLm5ldD6JAWgEEAEIAFIG CwkHCAMCBBUICgIDFgIBAhsDAh4BGBhoa3BzOi8va2V5cy5vcGVucGdwLm9yZxYh BAYkCzLIGWRy7CxJixh1Hny95JVABQJg1CX8BQkDwsCaAAoJEBh1Hny95JVAkUEH /jkzYrRh7muqoebwEgVeULzPbAs/nYJm9SMME2ypB2FS8kusO7lE+33UJO7PhHkJ 0nJ+tPfP8UV+fCzVjKjabzpvUGuiMWKRZEK9xNoxwi/epOrRw87msHA2LPqEob+F sVh09Nc58s75koUgSYp5h0FjsLK0+fwsQ6PtTfpY5W6JJVJRQnMwGKk5czrukBSM 79kJvphgul2xuzqo5K7rM98dL75AwCJmJZnbyXpUJIhtY/G01nURupBiQGgNixYs Zeo6OR669TFrMRWxueXtlHD0WaX7JNSlR5uyzpVaDCH0Kxa6ozmZtD+a6dAXg630 zbLGHg51JIm38Uvi1i47Jaa0KCJILlIuIENvbW11bmljYXRpb25zIiA8ZG5zQGRu c3dhdGNoLmNvbT6JAWgEEAEIAFIGCwkHCAMCBBUICgIDFgIBAhsDAh4BGBhoa3Bz Oi8va2V5cy5vcGVucGdwLm9yZxYhBAYkCzLIGWRy7CxJixh1Hny95JVABQJg1CX8 BQkDwsCaAAoJEBh1Hny95JVAABoH/iOWA+9BKxLIAIFgW2nxTFDrGvbxXL/mVSFt SOInKX8UqqfLCcikfpWLsj2D7mg5rKFMCu+31UYYlnrXl4YY1qruq0vh41L72qNy yHYol+xW4BSbZXf2q2ph7+lnPsFoodw7acVun5F8M8NH0roo5AOSbgRlK69ZFIcq fDEJdtk4oul7pqGArdeTCCdrSaeR3zrRN8P0PDOkGKSdlpeOE6XHnbbmAPZIhr/9 KsSpX1BGyipda3k5kOB4TsGVo+cRJMkK+GMpsZ+lJ7ZzRbjHbC+b52TiAIjMtXCK 3A3LrDUeMoJwvRKoO1tzquF6HqHJSg0ArZOvAB3BHlwUyUtA/o25AQ0EYNPMYQEI ANFpucNRdYEOubTNluoK97N9JmDb0WRXPPow+3XfBom6ZBSrWqNBgqDbjxSsLB00 QXbA8EB5W/Oolp/0epwEtgNAxyKVPowE/un+rY1PqvGjeAR4gBhY9Za1Lg1Q3vnR /WzsY7RIQCqhWUbfdGn1u6r/EgTBVrwUp4U/3ggfSz/PcUt4pUhlgxfYvjSjOgEZ wbqaQIwWud11FKMARNAUJzvJL/fDGeKLMvgRUwynIDGzCq7e67hhEEo5jwkZ0gEl 8RxXHKFuYkbb/q7rpdifXYYT6QCFlEZhiRbtH5Us7kgKuRD2XUFEQnN4U/rxuydH 4XOP6iOhiZfYnK/y9HBeRCMAEQEAAYkBPAQYAQgAJgIbDBYhBAYkCzLIGWRy7CxJ ixh1Hny95JVABQJg1CYkBQkDwsDDAAoJEBh1Hny95JVApBsH/iEg2ANRkHByfXB+ sH3PMf2Jsg5NSuj8OiNeKKGGIKCJkSAPjtv5rvKLNcvIcTR5Vnhr0e6AteFcK2te iFWDmj0QuFoQNvIOHQ3nHBPSpai2Ubq12nvYfg4bYK28AMi4xPMssgQ8awFgAI2V k9okq5XwC0Cc1MGhupEWYYSaFLIDQvFvRRSw1Lyc/W3SKa4d2dgesIPnB/rdv0Zq u8ftsSmurKxA2hQeNIcn06Ew7AbWUIjFX/bDXJlg/3Sj/spU2ur23TmaADBKhT5P DvfdaFTkk0SBfpN1j2S0DNXBHSrWvRp15zZmU4hwELiUY/H2/j/XpOGV3Q0i2iob 1hJ30C8= =aMQi -----END PGP PUBLIC KEY BLOCK----- --=_4ae67900a837bea3c4e6e4d25307e30b--