Hibernating sockets to support C10M

From: Mark Delany <n6t_at_oscar.emu.st>
Date: Tue, 11 Nov 2025 13:46:52 UTC
This must surely be an old idea so I'm curious as to whether much discussion has happened
previously and what sort of conclusions they came to.

This is mostly thinking about huge numbers of relatively idle TCP sockets on servers where
clients wish to stay connected for very long periods of time waiting for infrequent server
pushes. Two examples are RSS and DNS Push Notifications (rfc8765). In these cases a client
might establish a socket and not get server pushed traffic for hours or days.

As I understand it, the main limitation on the number of concurrent TCP sockets a system
can support is memory. I expect that socket buffers are the biggest consumers of kernel
memory and while user-space is very language and application dependent, it wouldn't
surprise me if a typical application - even a lean one - requires multiple kB of memory
for each client connection (and quite possibly other non-memory resources).

So the idea is that these idle sockets are "hibernated", which is to say that the
application and the kernel release most of their memory associated with an idle socket.

When the server application makes the decision to hibernate a socket - perhaps based on an
inactivity timer - it releases as many application resources as it can and retains *just*
enough state to reconstitute those resources at a later time. It then calls the kernel to
do the same thing - namely release as many kernel resources as possible associated the
socket and retain *just* enough state to reconstitute the socket at a later time.

Assuming the TCP session is completely idle, the kernel should be able to release most of
the memory associated with the socket. I don't know exactly how much state the kernel
needs to reconstitute an idle socket, but considering addrs/ports tuple, sequence numbers,
socket buffer size values, socket options and a few other odds and sods, does the state
representation of a socket require much more than a couple of hundred bytes?

When the kernel sees incoming traffic for the hibernating socket, it reconstitutes the
socket by reallocating socket buffers and so on, then notifies the application that the
socket is readable in the usual way via kqueue(), select(), read(), etc. The server
application recognizes a hibernating socket by fd and reconstitutes the client state prior
to processing the inbound data.

If the server application wants to send traffic to a hibernating socket, it reconstitutes
the client state and writes to the socket. On seeing the write(), the kernel recognizes a
hibernating socket and revivifies it from the saved state prior to sending the traffic
down thru the network stack.

In the best-case scenario, the kernel requires perhaps 200-ish bytes of state memory per
hibernating socket and the application may need as little as 0-8 bytes of state memory if
the fd can be used as an index into disk-based state or a pointer array.

What sort of memory savings can hibernating sockets offer?

If we say that an active socket of a lean server application consumes 100kB of kernel+user
memory and a hibernating socket consumes 0.2kB of kernel+user memory then the memory
required to support 10M idle sockets reduces from 1,000Gb to 2Gb. More realistically, if
we set the number of idle sockets to, say, 80%, then the memory reduction is from 1,000Gb
to 201Gb which still seems pretty useful.

All that's needed to support this optimization is a hibernate(socketfd) syscall and a
revivify(socketfd) kernel function triggered by inbound traffic and write(). Well, that
and a bunch of code, but you get the idea.


Thoughts?


Mark.