BAD state/State failure with large number of requests

Thu Sep 28 16:01:02 PDT 2006

Hi,

thank you very much for your fast response.

Daniel Hartmeier wrote:

> The client is not honouring the 2MSL quiet period, the time it should
> wait before re-using the same source port to connect to the same
> destination address/port, as required by the TCP RFCs.
> 
> The reason for that is quite likely that it has run out of random high
> source ports. The range used should be about 49152-65536 (sysctl
> net.inet.ip.portrange.*), and 10,000 connections is getting close. The
> client stack can either make ap fail in connect(2), or re-use source ports
> and violate the RFCs in this case.

You're absolutely correct, that seems to be my problem. Increasing the 
range allows me to get a lot more requests through.

> Not sure if this is a realistic test, i.e. whether you see the very same
> problem in production (with 'BAD state' messages for SYN packets), it
> would only occur if one client is establishing connections to the same
> server port at high concurrency and/or rate. If not, I'd say the test is
> simply flawed, and you need multiple clients to simulate realistically.

I've been suspecting that the test is flawed, but I couldn't put my 
finger on it. However, I also need a way to actually test my 
application with a lot of requests and I wouldn't want to buy another 
server farm for that ;)

> pf keeps state entries around for a while after a connection has been
> closed (to catch packets related to the old connection that might arrive
> late), the timeout is tcp.closed, 90s by default. You can make pf purge
> such state entries sooner by lowering this timeout.

That timeout seems awfully long to me. Is there some standard that 
mandates such a long timeout? At least for testing I will definitely 
lower that, too.

Thanks again, Rolf.