Re: compressed TIME-WAIT to be decomissioned
- Reply: Gleb Smirnoff : "Re: compressed TIME-WAIT to be decomissioned"
- In reply to: Gleb Smirnoff : "compressed TIME-WAIT to be decomissioned"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 13 Jan 2022 07:16:09 UTC
On 2022-01-12 10:48, Gleb Smirnoff wrote: > Hi! Thanks for the informative writeup, Gleb! Untrimmed, sorry... > > [crossposted to current@, but let's keep discussion at net@] > > I have already touched the topic with rrs@, jtl@, tuexen@, rscheff@ and > Igor Sysoev (author of nginx). Now posting for wider discussion. > > TLDR: struct tcptw shall be decomissioned > > Longer version covers three topics: why does tcptw exist? why is it no > longer necessary? what would we get removing it? > > Why does struct tcptw exist? > > When TCP connection goes to TIME-WAIT state, it can only retransmit > the very last ACK, thus doesn't need all of the control data in the kernel. > However, we are required to keep it in memory for certain amount of time > (2*MSL). So, let's save memory: free the socket, free the tcpcb and > leave only inpcb that will point at small tcptw (much smaller than tcpcb) > that holds enough info to retransmit the last ACK. This was done in > early 2003, see 340c35de6a2. > > What was different in 2003 compared to 2022? > > * First of all, internet servers were running i386 with only 2 Gb of KVA > space. Unlike today, they were memory constrained in the first place, not > CPU bound like they are today. > > * Many of HTTP connections were made by older browsers, which were not able > to use persistent HTTP connections. Those browsers that could, would > recycle connections more often, then today. Default timeouts in Apache > for persistent connections were short. So, the ratio of connections > in TIME-WAIT compared to live connections was much bigger than today. > Here is sample data from 2008 provided to me by Igor Sysoev: > > ITEM SIZE LIMIT USED FREE REQUESTS FAILURES > tcpcb: 728, 163840, 22938, 72722, 13029632, 0 > tcptw: 88, 163842, 10253, 72949, 2447928, 0 > > We see that TIME-WAITs are ~ 50% of live connections. > > Today I see that TIME-WAITs are ~ 1% of connections. My data is biased > here, since I'm looking at servers that do mostly video streaming. I'd > be grateful if anybody replies to this email with some other modern data > on ratio between tcpcb and tcptw allocations. > > * The Internet bandwidth was lower and thus average size of HTTP object > much smaller. That made the average send socket buffer size much smaller > than today. Note that TCP socket buffers autosizing came in 2009 only. > This means that today most significant portion of kernel memory consumed > by an average TCP connection is the send socket buffer, and > socket+inpcb+tcpcb is just a fraction of that. Thus, swapping tcpcb to > tcptw we are saving a fraction of a fraction of memory consumed by average > connection. > > * Who told that 2*MSL (60 seconds) is adequate time to keep TIME-WAIT? > In 71d2d5adfe1 I added some stats on usage of tcptw and experimented a bit > with lowering net.inet.tcp.msl. It appeared that lowering it down three > times doesn't have statistically significant effect on TIME-WAIT use > stats. > This means that the already miniscule number of TIME-WAIT connection on a > modern HTTP server can be lowered 3 times more. Feel free to lower > net.inet.tcp.msl and do your own measurements with > 'netstat -sp tcp | grep TIME-WAIT'. I'd be glad to see your results. I think that should be: 'netstat -sp tcp | grep TIME_WAIT' fe; on the system I'm writing this from: up 15:19, coffee# netstat -sp tcp | grep TIME_WAIT 5 connections in TIME_WAIT state > > Ok, now what would removal give us? > > * One less alloc/free during socket lifetime (immediately). > * Reduced code complexity. inp->inp_ppcb always can be dereferenced as > tcpcb. > Lot's of checking for inp->inp_flags & INP_TIMEWAIT goes away > (eventually). > * Shrink of struct inpcb. Today inpcb has some TCP-only data, e.g. HPTS. > Reason for that is obvious - compressed TIME-WAIT. A HPTS-driven > connection > may transition to TIME-WAIT, so we can't use tcpcb. Now we would be able > to. > So, for non TCP connections memory footprint shrinks (with following > changes). > * Embedding inpcb into protocols cb. An inpcb becomes one piece of memory > with > tcpcb. One more less alloc/free during socket lifetime. Reduced code > complexity, since now inpcb == tcpb (following changes). > > How much memory are we going to lose? > > (kgdb) p tcpcb_zone->uz_keg->uk_rsize > $5 = 1064 > (kgdb) p tcptw_zone->uz_keg->uk_rsize > $6 = 72 > (kgdb) p tcpcbstor->ips_zone->uz_keg->uk_rsize > $8 = 424 > > After change a connection in TIME-WAIT would consume 424+1064 bytes instead > of 424+72. Multiply that by expected number of connections in TIME-WAIT on > your machine. > > Comments welcome.