TCP over UDP

Pieter de Goeje pieter at degoeje.nl
Tue Jul 13 09:53:33 UTC 2010


On Monday 12 July 2010 22:25:25 Sergey Babkin wrote:
> Pieter de Goeje wrote:
> > On Saturday 10 July 2010 14:05:29 Sergey Babkin wrote:
> > > Hi guys,
> > > 
> > > I've got this idea, and I wonder if anyone has done it already,
> > > and if not then why. The idea is to put the TCP logic over UDP.
> > > 
> > > I've done some googling and all I've found is some academical
> > > user-space implementations of TCP that actually try to interoperate
> > > with "real" TCP. What I'm thinking about is different. It's
> > > to use the TCP-derived logic as a portable library that would
> > > do the good flow control, retransmitting, delivery confirmations
> > > etc over UDP.
> > 
> > TCP actually scales pretty well. All modern operating systems provide a
> > way to do efficient select() operations, for example with FreeBSD's
> > kqueue. Using a small bit of tuning one can effectively deal with 100k+
> > TCP connections on a
> 
> Not in a single process though.

There's no reason why not. I know I've done it before. Obviously there are 
some practical problems, one of which you've described below.

> 
> > single system. This mainly has to do with increasing the maximum number
> > of filedescriptors and decreasing the maximum send/receive buffer sizes
> > to conserve memory.
> 
> Only in theory. My practical experience goes like this: How
> many parallel clients can our multithreaded server handle?
> Why, it should be easy to scale to almost anywhere, just
> bump the limit on the file descriptors. Bumped, tried. It
> crashes soon after 1000 connections. Why? A week later,
> the investigation has shown that we use PAM, and a PAM library
> for network authentication opens its own socket, and calls
> select() on it. It uses the standard fd_set, so when the socket
> happens to be above 1024, it corrupts the stack. So the only
> way to get a large number of file descriptors is in a very
> controlled environment, making sure not to use any 3rd-party
> or system libraries that might ever want to call select().

A bug in a 3rd party library is no excuse not to use lots of filedescriptors. 
You can theoretically even isolate the library in a different process and use 
IPC to do the authentication. Your proposed solution to use a userspace TCP 
over UDP library is only a workaround for that problem IMHO.

> 
> > TCP provides very good throughput, and it achieves this using large send
> > and receive buffers. Your userspace implementation will need to
> > implement something similar. A few hundred bytes per connection is
> > simply not enough.
> 
> A hundred or less bytes just for the state, for a connection
> that doesn't transfer anything at the moment. HTTP 1.1 and
> SOAP services on top of it do this: open a connection, and then
> after the first request keep it open, in case if they would want
> to send more requests. The minimum state would be pretty much a pair
> of addresses and sequence numbers, plus whatever tree or hash
> table overhead used to organize them.

It is possible to decrease the send/recv buffer size of a connection when you 
know the connection is going to be idle. I've just tested this and it's 
possible to make both buffers 1 byte in size :)

> 
> > If you want to deal with millions of clients, your protocol shall better
> > not have any state at all. A good example of this is DNS.
> 
> DNS is actually a very bad example IMO. A very fragile protocol
> that trips over itself all the time. On the contrary, it's another
> case that should be able to benefit a lot from TCP-over-UDP.

Granted, but DNS is plagued by security problems and hacks to solve them. 
Technically however, a simple (stateless) request/response protocol over UDP 
is the way to go if the number of clients is basically unbounded. 
It is unfortunate that the connectionless nature of UDP makes naive servers 
vulnerable to reflexive DoS attacks, which limits the applicability somewhat 
to private servers or public servers with very well thought out protocols.

- Pieter


More information about the freebsd-hackers mailing list