kern/174087: Problems with ephemeral port selection
Keith Arner
vornum at gmail.com
Mon Dec 3 15:40:00 UTC 2012
>Number: 174087
>Category: kern
>Synopsis: Problems with ephemeral port selection
>Confidential: no
>Severity: non-critical
>Priority: low
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Mon Dec 03 15:40:00 UTC 2012
>Closed-Date:
>Last-Modified:
>Originator: Keith Arner
>Release: 7.2
>Organization:
Panasas
>Environment:
FreeBSD pa-twin-19a 7.2-RELEASE FreeBSD 7.2-RELEASE #0: Mon Apr 19 16:24:09 EDT 2010 root at perf-x3:/usr/obj/usr0/jimz/freebsd-c-rack/sys/PANASAS amd64
>Description:
Date: Fri, 30 Nov 2012 09:09:08 -0500
From: Keith Arner <vornum at gmail.com>
To: freebsd-net at freebsd.org
Subject: Problems with ephemeral port selection
Message-ID: <CAEo_tUH9LPzPFP-O=317rYEQ3nT66b4biQshV_8=L8hReO_BLg at mail.gmail.com>
I've noticed some issues with ephemeral port number selection from
tcp_connect(), which limit the number of concurrent, outgoing connections
that can be established (connect(), rather than accept()). Sifting through
the source code, I believe the issuess stem from two problems in the
tcp_connect() code path. Specifically:
1) The wrong function gets called to determine if a given ephemeral
port number is currently usable.
2) The ephemeral port number gets selected without considering the
foreign addr/port.
Curiously, the effect of #1 mostly cancels the effect of #2, such that
the common calling convention gives you a correct result so long as you
only have a small number of outgoing connections. However, once you get to
a large number of outgoing connections, things start to break down. (I'll
define large and small later.)
As a side note, I have been working with FreeBSD 7.2. The implementations
of several of the relevant functions have been refactored somewhere between
7.2-RELEASE and 9-STABLE, but the core problems in the logic seem to be
the same between versions.
For problem #1, the code path that selects the ephemeral port number is:
tcp_connect() ->
in_pcbbind() ->
in_pcbbind_setup() ->
in_pcb_lport() [not in FreeBSD 7.2] ->
in_pcblookup_local()
There is a loop in in_pcb_lport() [or directly in in_pcbbind_setup() in
earlier releases] that considers candidate ephemeral port numbers and
calls in_pcblookup_local() to determine if a given candidate is suitable.
The default behaviour (if the caller has not set either SO_REUSEADDR or
SO_REUSEPORT) is to pick a local port number that is not in use by
*any* local TCP socket.
So long as the number of concurrent, outgoing connections is less than the
range configured by `sysctl net.inet.ip.portrange.*`, selecting a totally
unique ephemeral port number works OK. However, you cannot exceed that
limit, even if each outgoing connection has a unique faddr/fport. This
does not limit the number of connections that can be accept()'ed, only the
number of connections that can be connect()'ed.
In this particular path, I think the code should call in_pcblookup_hash(),
rather than in_pcblookup_local(). The criteria in in_pcblookup_hash() only
match if the full 5-tuple matches, rather than just the local port number.
The complication, of course, comes from the fact that in_pcbbind() is
called from both bind() and for the implicit bind that happens for a
connect(). The matching criteria in in_pcblookup_local() make sense for
the former but not quite for the later.
I mentioned that the above is the default behaviour you get when you don't
specify SO_REUSEADDR or SO_REUSEPORT. Setting SO_REUSEADDR
before calling connect() has some surprizing consequences (surprizing in the
sense that I don't believe SO_REUSEADDR is supposed to have any effect
on connect()). In this case, when in_pcblookup_local() is called, wild_okay
is set to false. This changes the matching criteria to (in effect) allow
tcp_connect() to use the full 5-tuple space. However, this brings us to the
second problem.
Problem #2 is that the ephemeral port number is chosen before the
fport/faddr gets set on the pcb; that is tcp_connect() calls in_pcbbind() to
select the ephemeral port number, *then* calls in_pcbconnect_setup() to
populate the fport/faddr. With SO_REUSEADDR, in_pcbbind() can select
an in-use local port. If the local port is used by a socket with a different
laddr/fport/faddr, all is good. However, if the local port selection
results in a
full conflict it will get rejected by the call to in_pcblookup_hash() inside
in_pcbconnect_setup(). This happens *after* the loop inside
in_pcbbind(), so the call to tcp_connect() fails with EADDRINUSE. Thus,
with SO_REUSEADDR, connect() can fail with EADDRINUSE long before
the ephemeral port space has been exhausted. The application could re-try
the call to connect() and likely succeed, as a new local port would be
selected.
Overall, this behaviour hinders the ability to open a large number of
outbound connections:
* If you don't specify SO_REUSEADDR, you have a fairly limited maximum
number of outbound connections.
* If you do specify SO_REUSEADDR, you are able to open a much larger
number of outbound connections, but must retry on EADDRINUSE.
I believe that the logic under tcp_connect() should be modified to:
- behave uniformly whether or not SO_REUSEADDR has been set
- allow outgoing connection requests to re-use a local port number, so
long as the remaining elements of the tuple (laddr, fport, faddr) are
unique
==========
Follow-up from the freebsd-net mailing list:
Date: Sat, 01 Dec 2012 11:31:31 -0300
From: Fernando Gont <fernando at gont.com.ar>
To: Keith Arner <vornum at gmail.com>
Cc: freebsd-net at freebsd.org
Subject: Re: Problems with ephemeral port selection
Message-ID: <50BA14C3.4070601 at gont.com.ar>
In-Reply-To: <CAEo_tUH9LPzPFP-O=317rYEQ3nT66b4biQshV_8=L8hReO_BLg at mail.gmail.com>
References: <CAEo_tUH9LPzPFP-O=317rYEQ3nT66b4biQshV_8=L8hReO_BLg at mail.gmail.com>
Next in thread | Previous in thread | Raw E-Mail | Index | Archive | Help
Hi, Keith,
On 11/30/2012 11:09 AM, Keith Arner wrote:
>
> - behave uniformly whether or not SO_REUSEADDR has been set
> - allow outgoing connection requests to re-use a local port number, so
> long as the remaining elements of the tuple (laddr, fport, faddr) are
> unique
Please take a look at the discussion on how to "steal" incomming
connections in Section 3.1 of RFC 6056.
Cheers,
--
Fernando Gont
e-mail: fernando at gont.com.ar || fgont at si6networks.com
PGP Fingerprint: 7809 84F5 322E 45C7 F1C9 3945 96EE A9EF D076 FFF1
>How-To-Repeat:
connect() a large number of sockets, specifying SO_REUSEADDR before
calling connect(). Note that the call to connect() fails with
EADDRINUSE long before we run into any resource exhaustion.
Then connect() a large number of sockets, without specificying
SO_REUSADDR (while all the previous sockets are still open). Note
that connect() then fails with EADDRNOTAVAIL; this occurs as soon
as the total number of outgoing connections equals the ephemeral
port range.
#include <sys/types.h>
#include <sys/socket.h>
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <net/if.h>
#include <arpa/inet.h>
int last_child = -1;
#define complain(exit_val) \
{ \
return(exit_val); \
}
int SockOpt(int s, int level, int opt)
{
int opt_val = 1;
int ret = setsockopt(s, level, opt, &opt_val, sizeof(opt_val));
if (ret) {
perror("Could not setsockopt() on socket");
complain(-1);
}
return 0;
}
int open_server(int port)
{
int ret;
struct sockaddr_in sin;
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = htonl(INADDR_ANY);
sin.sin_port = htons(port);
int server = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
if (server < 0) {
perror("Could not open server socket");
complain(-1);
}
SockOpt(server, SOL_SOCKET, SO_REUSEADDR);
ret = bind(server, (struct sockaddr *)&sin, sizeof(sin));
if (ret) {
perror("Could not bind() server socket");
complain(-1);
}
ret = listen(server, 5);
if (ret) {
perror("Could not listen() server socket");
complain(-1);
}
return server;
}
int cycle_client(int server, int iteration, int port, int reuse)
{
int ret;
struct sockaddr_in sin;
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
sin.sin_port = htons(port);
int client = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
if (client < 0) {
fprintf(stderr, "Iteration %d, errno %d: ", iteration, errno);
perror("Could not open client socket");
complain(-1);
}
if (reuse) {
SockOpt(client, SOL_SOCKET, SO_REUSEADDR);
}
ret = connect(client, (struct sockaddr *)&sin, sizeof(sin));
if (ret) {
fprintf(stderr, "Iteration %d, errno %d: ", iteration, errno);
perror("Could not connect() client socket");
complain(-1);
}
int len;
int child = accept(server, (struct sockaddr *)&sin, &len);
if (child < 0) {
fprintf(stderr, "Iteration %d, errno %d: ", iteration, errno);
perror("Could not accept() child socket");
complain(-1);
}
/* Why are we not closing the sockets?
*
* The point of this program is to illustrate the behaviour of the
* network stack when we open (or, rather connect()) a large number of
* outgoing sockets. Thus, we want the sockets to linger around, to
* consume ephemeral port numbers. Note that we could get largely
* similar behaviour by closing the sockets (if we close the client
* socket first), as the pcbs would linger in the TIME_WAIT state,
* consuming emphemeral port numbers.
*
* Note that because TIME_WAIT connections count against up, the
* behaviour being illustrated does not rely on a large number of
* concurrent connections, just a large number of outgoing connections
* established over a short time period. But it is easier to understand
* the operation of this program if we leave the sockets open.
/*
ret = close(client);
if (ret) {
fprintf(stderr, "Iteration %d, errno %d: ", iteration), errno;
perror("Could not close() client");
complain(-1);
}
*/
/*
if (last_child) {
ret = close(child);
if (ret) {
fprintf(stderr, "Iteration %d, errno %d: ", iteration, errno);
perror("Could not close() child");
complain(-1);
}
}
*/
last_child = child;
return 0;
}
/* Main loop to illustrate ephemeral port number behaviour.*/
int main(int argc, void **argv)
{
/* num_iterations: How many sockets do we want to try to open per remote
* port number? Should be set higher than the number of unique
* ephemeral port numbers that the stack can choose from. With the
* default FreeBSD settings, that works out to:
*
* net.inet.ip.portrange.last: 65535
* net.inet.ip.portrange.first: 49152
*
* 65535 - 49152 = 16383
*/
int num_iterations = 20 * 1000;
/* num_ports: How many distinct remote ports to we want to connect to? */
int num_ports = 2;
/* port: base, remote port number to connect to */
int port = 12345;
/* reuse: Should we set SO_REUSEADDR before calling connect()?
* Note that we alternate this value each for each remote port, to
* illustrate the differences in behaviour between setting it or not. */
int reuse = 1;
int port_loop;
for (port_loop=0; port_loop<num_ports; port_loop++) {
/* Set up a listening socket on the next remote port number. */
int server = open_server(port);
int i=0;
for(; i<num_iterations; i++) {
/* Open a bunch of sockets; and bail out on the first failure. */
if (cycle_client(server, i, port, reuse)) {
break;
}
}
/* How many connections did we manage to establish on this port
* number (and with this "reuse" setting)? If all is working,
* we ought to be able to establish as many connections as there
* are ephemeral ports, and we ought to be able to do so for each
* remote port number (baring memory exhaustion problems). */
fprintf(stderr, "port %d; reuse %d; opened %d\n",
port, reuse, i);
/* Advance to the next remote port, and toggle whether we set
* SO_REUSEADDR. */
port++;
reuse = !reuse;
}
return 0;
}
>Fix:
>Release-Note:
>Audit-Trail:
>Unformatted:
More information about the freebsd-bugs
mailing list