Re: git: 0d7445193abc - main - tcp: remove tcptw, the compressed timewait state structure

From: Gleb Smirnoff <glebius_at_freebsd.org>
Date: Fri, 07 Oct 2022 02:46:56 UTC
  Hi

On Fri, Oct 07, 2022 at 02:35:38AM +0000, Gleb Smirnoff wrote:
T>     The memory savings the tcptw brought back in 2003 (see 340c35de6a2) no
T>     longer justify the complexity required to maintain it.  For longer
T>     explanation please check out the email [1].
T>     
T>     Surpisingly through almost 20 years the TCP stack functionality of
T>     handling the TIME_WAIT state with a normal tcpcb did not bitrot.  The
T>     existing tcp_input() properly handles a tcpcb in TCPS_TIME_WAIT state,
T>     which is confirmed by the packetdrill tcp-testsuite [2].
T>     
T>     This change just removes tcptw and leaves INP_TIMEWAIT.  The flag will
T>     be removed in a separate commit.  This makes it easier to review and
T>     possibly debug the changes.
T>     
T>     [1] https://lists.freebsd.org/archives/freebsd-net/2022-January/001206.html
T>     [2] https://github.com/freebsd-net/tcp-testsuite
T>     
T>     Differential revision:  https://reviews.freebsd.org/D36398

The memory savings cut here are more than just sizeof(struct tcpcb) -
sizeof(struct tcptw).  We also keep the socket around for duration of TIME_WAIT.
And the stuff that may hang off the socket, e.g. kTLS context.

I prepared a changeset that would free the socket when transitioning to TIME_WAIT:

https://reviews.freebsd.org/D36887

However, the patch adds back some complexity that we just removed.

It also introduces a problem with two recent features of pluggable TCP stacks
and massive setsockopt: we got sysctl_setsockopt() that would run setsockopt on
a pcb matching certain criteria. One of most common applications of this tool
is to switch from one TCP stack to the other. To execute sosetopt() the
sysctl_setsockopt() needs socket. With the compressed timewait structure, we
didn't have it. Now we don't, too. However, the compressed timewaits were
short-circuited to the default TCP stack and didn't depend on the pluggable
ones. With a regular tcpcb we still reference the stack. That makes it impossible
to switch off ALL existing connections from some stack with sysctl_setsockopt().

After todays discussion with mtuexen@, rscheff@ and peterlei@ we decided not to
rush with checking in D36887 and have more discussion whether the memory savings
worth the comlexity or not. But keep the revision around.

-- 
Gleb Smirnoff