Ephemeral port range (patch)

Tue Mar 4 07:44:55 UTC 2008

At 03:37 a.m. 04/03/2008, Mike Silbersack wrote:

>>While I haven't look match at the scheme proposed by Amit, I think 
>>there's a "flaw" with the algorithm: IP IDs need to be unique for 
>>{source IP, des IP, Protocol}. And the algorithm still keeps a 
>>*global* IP ID. That means you'll cycle through the whole IP ID 
>>space when you probably didn't need to.
>
>That is true.  I think we have a time/space tradeoff here, with 
>Amit's algorithm taking more memory and less time than a hash-based 
>algorithm. But I haven't benchmarked one against the other, so it is 
>possible that a double-hash might win in both categories.

(Thinking out loud)
Note that in the case of implementing the double-hash scheme for 
connection-oriented protocols, once you compute the hash for the 
first IP ID to be used for a connection, you could store the result 
of the hash in the TCB, and thus you wouldn't need to recompute this 
"expensive" hash every time you send a packet.

>I think Robert Watson said something about investigating the issue 
>of IP IDs more in the near future.  What I'd like to see (if 
>possible) is that we use Amit's algorithm until we've established a 
>connection with a host, then switch to per-IP state and just use 
>linear IP IDs.  That would seem to provide the least overhead for 
>high speed connections.

I haven't yet looked that much at Amit's approach but, from what I 
have seen, your suggestion makes sense.

>>That said, at least theoretically speaking, one could argue that 
>>there shouldn't be a problem with simply randomizing the IP ID 
>>number. For connection-oriented protocols, you should be doing 
>>PMTUD, and thus will not care about the IP ID. If your packets are 
>>doing fragmentation, then on links will large bandwidth-delay 
>>products you're already in trouble. For connection-less transport 
>>protocols (e.g., UDP), while they usually do not implement PMTUD, 
>>they also do not implement flow-control or congestion control. So 
>>you are either sending data to a local system (e.g., in a LAN), or 
>>you probably shouldn't be sending data that fast (and then you 
>>shouldn't have problems with trivially randomizing the IP ID).
>
>I have attempted to make that argument before, and it did not go 
>over well with most people.  :)
>
>I think the counter-argument was primarily centered around UDP NFS, 
>which, as you pointed out, is almost always a losing case.

Relying on IP fragmentation for anything that is supposed to be 
reliable and that should work at high speed is...mmm... probably not 
the best idea. ;-)  Other than the classic "fragmentation considered 
harmful", there's a more recent id (RFC?) entitled "fragmentation 
considered very harmful" which shows the problems that may arise due 
to fragmentation.

So the thing here is that people want to do the wrong thing, and then 
blame the IP ID generator. ;-)

>>>The double-hash concept sounds pretty good, but there's a major 
>>>problem with it.  If an application does a bind() to get a local 
>>>port before doing a connect(), you don't know the remote IP or the remote port.
>>
>>Yes, this is described in Section 3.5 of our id 
>>(http://www.ietf.org/internet-drafts/draft-ietf-tsvwg-port-randomization-01.txt). 
>>Our take is that in that scenario you could simply randomize the 
>>local port. (i.e., implement the double-hash scheme, and fall-back 
>>to trivial randomization when you face this scenario).
>
>Doh, I will try to read the ENTIRE paper next time before commenting.

No worries.

>>>There's a related "feature" in the BSD TCP stack that all local 
>>>ports are considered equal; even for applications that do a 
>>>connect() call and specify a remote IP/port, we do not let them 
>>>use the same local port to two different remote IPs at the same 
>>>time.  This puts a limit on the total number of outgoing 
>>>connections that one machine can have.
>>
>>mmm... I see. So this could limit the number of outgoing 
>>connections to about (ephemeral_ports/TIME_WAIT). Any objections 
>>against changing this? At least for outgoing connections (i.e., 
>>non-listening sockets), this shouldn't be the case. I'd be 
>>interested in working on this issue...
>
>I don't think anyone is actively working on that problem, so you 
>won't be stepping on anyone's toes by looking into it.  Bring on the patches!

Great! Will do.

>There's a piece of low hanging fruit also in that area - we add 
>incoming connections to the local port hash table, even though it 
>seems unlikely that you are going to receive a connection from 
>1.1.1.1:50000->1.1.1.2:80 and then connect from 
>1.1.1.2:80->1.1.1.1:50000.  Those unnecessary additions to the local 
>port hash time would be nice to remove if you're investigating the 
>related issues.

Ok.

>One thing you may or may not have noticed is that FreeBSD keeps 
>TIME_WAIT sockets in a seperate zone which has a limit size, so you 
>will not have to worry too much about them clogging up all ephemeral ports.

I had not... but will have a look at it.

Thanks!

--
Fernando Gont
e-mail: fernando at gont.com.ar || fgont at acm.org
PGP Fingerprint: 7809 84F5 322E 45C7 F1C9 3945 96EE A9EF D076 FFF1