Re: compressed TIME-WAIT to be decomissioned
- Reply: Gleb Smirnoff : "Re: compressed TIME-WAIT to be decomissioned"
- In reply to: Gleb Smirnoff : "compressed TIME-WAIT to be decomissioned"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 13 Jan 2022 07:16:09 UTC
On 2022-01-12 10:48, Gleb Smirnoff wrote:
> Hi!
Thanks for the informative writeup, Gleb!
Untrimmed, sorry...
>
> [crossposted to current@, but let's keep discussion at net@]
>
> I have already touched the topic with rrs@, jtl@, tuexen@, rscheff@ and
> Igor Sysoev (author of nginx). Now posting for wider discussion.
>
> TLDR: struct tcptw shall be decomissioned
>
> Longer version covers three topics: why does tcptw exist? why is it no
> longer necessary? what would we get removing it?
>
> Why does struct tcptw exist?
>
> When TCP connection goes to TIME-WAIT state, it can only retransmit
> the very last ACK, thus doesn't need all of the control data in the kernel.
> However, we are required to keep it in memory for certain amount of time
> (2*MSL). So, let's save memory: free the socket, free the tcpcb and
> leave only inpcb that will point at small tcptw (much smaller than tcpcb)
> that holds enough info to retransmit the last ACK. This was done in
> early 2003, see 340c35de6a2.
>
> What was different in 2003 compared to 2022?
>
> * First of all, internet servers were running i386 with only 2 Gb of KVA
> space. Unlike today, they were memory constrained in the first place, not
> CPU bound like they are today.
>
> * Many of HTTP connections were made by older browsers, which were not able
> to use persistent HTTP connections. Those browsers that could, would
> recycle connections more often, then today. Default timeouts in Apache
> for persistent connections were short. So, the ratio of connections
> in TIME-WAIT compared to live connections was much bigger than today.
> Here is sample data from 2008 provided to me by Igor Sysoev:
>
> ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
> tcpcb: 728, 163840, 22938, 72722, 13029632, 0
> tcptw: 88, 163842, 10253, 72949, 2447928, 0
>
> We see that TIME-WAITs are ~ 50% of live connections.
>
> Today I see that TIME-WAITs are ~ 1% of connections. My data is biased
> here, since I'm looking at servers that do mostly video streaming. I'd
> be grateful if anybody replies to this email with some other modern data
> on ratio between tcpcb and tcptw allocations.
>
> * The Internet bandwidth was lower and thus average size of HTTP object
> much smaller. That made the average send socket buffer size much smaller
> than today. Note that TCP socket buffers autosizing came in 2009 only.
> This means that today most significant portion of kernel memory consumed
> by an average TCP connection is the send socket buffer, and
> socket+inpcb+tcpcb is just a fraction of that. Thus, swapping tcpcb to
> tcptw we are saving a fraction of a fraction of memory consumed by average
> connection.
>
> * Who told that 2*MSL (60 seconds) is adequate time to keep TIME-WAIT?
> In 71d2d5adfe1 I added some stats on usage of tcptw and experimented a bit
> with lowering net.inet.tcp.msl. It appeared that lowering it down three
> times doesn't have statistically significant effect on TIME-WAIT use
> stats.
> This means that the already miniscule number of TIME-WAIT connection on a
> modern HTTP server can be lowered 3 times more. Feel free to lower
> net.inet.tcp.msl and do your own measurements with
> 'netstat -sp tcp | grep TIME-WAIT'. I'd be glad to see your results.
I think that should be:
'netstat -sp tcp | grep TIME_WAIT'
fe; on the system I'm writing this from:
up 15:19, coffee#
netstat -sp tcp | grep TIME_WAIT
5 connections in TIME_WAIT state
>
> Ok, now what would removal give us?
>
> * One less alloc/free during socket lifetime (immediately).
> * Reduced code complexity. inp->inp_ppcb always can be dereferenced as
> tcpcb.
> Lot's of checking for inp->inp_flags & INP_TIMEWAIT goes away
> (eventually).
> * Shrink of struct inpcb. Today inpcb has some TCP-only data, e.g. HPTS.
> Reason for that is obvious - compressed TIME-WAIT. A HPTS-driven
> connection
> may transition to TIME-WAIT, so we can't use tcpcb. Now we would be able
> to.
> So, for non TCP connections memory footprint shrinks (with following
> changes).
> * Embedding inpcb into protocols cb. An inpcb becomes one piece of memory
> with
> tcpcb. One more less alloc/free during socket lifetime. Reduced code
> complexity, since now inpcb == tcpb (following changes).
>
> How much memory are we going to lose?
>
> (kgdb) p tcpcb_zone->uz_keg->uk_rsize
> $5 = 1064
> (kgdb) p tcptw_zone->uz_keg->uk_rsize
> $6 = 72
> (kgdb) p tcpcbstor->ips_zone->uz_keg->uk_rsize
> $8 = 424
>
> After change a connection in TIME-WAIT would consume 424+1064 bytes instead
> of 424+72. Multiply that by expected number of connections in TIME-WAIT on
> your machine.
>
> Comments welcome.