Dropped/Duplicate SYN, Cisco PIX/ASA, and and random ISN w/ net.inet.ip.random_id=1

Fri Jul 17 23:01:05 UTC 2009


  We recently worked closely with some FreeBSD developers to track down
an illusive bug in the stack.

In a high performance environment, we observed dropped (or extreme
delayed) SYN packets, but were unable to easily reproduce the problem
using test case scenarios.

Our environment: 
 - FreeBSD 7.x Servers
 - FreeBSD 6.x clients
 - PIX/ASA 7.2.x stateful firewalls
 - pf(4) on the server with lots of jails
 - PHP clients and server with SOAP framework, so lots and lots 
   of sockets, often thousands between any given client->server,
   in various TCP states.

Getting to the heart of the matter, see:


Here we drop SYN's from [client:source_ephemeral_socket] if:

 1) We already have that exact combination in CLOSE_WAIT
 2) The ISN of the new incoming SYN is lower than that of
    the existing socket in CLOSE_WAIT

Those conditions are _highly_ unlikely, until you start hedging your

net.inet.ip.random_id=1 in sysctl.conf(5) is one way to exacerbate the
problem.  So are the magic scrubbing bubbles in pf.conf(5): scrub all
random-id.  Also, the PIX/ASA code randomizes IDs by default as well(*).

net.inet.ip.portrange.randomized is another; since truly randomized
numbers can involved duplicates.

Additionally, the default random port ephemeral source port range is way
too small for these HPC environments, leading to more likely collisions,
so that can be increased:


Anyway, this discussion strictly for the benefit of the mailing list
archives, in case, further down the road, someone else finds them self
tcpdump(8)'ing duplicate SYNs and starting at netstat(8) -s output and
beginning to doubt their own existence.

 ~Brian A. Seklecki

(*) To disable port randomization on the Cisco PIX:

tcp-map verify-chksum
 exceed-mss drop
 syn-data drop
 tcp-options selective-ack allow 
 urgent-flag clear
 no ttl-evasion-protection
icmp unreachable rate-limit 1 burst-size 1
timeout xlate 3:00:00
timeout conn 12:00:00 half-closed 0:10:00 udp 0:01:00 icmp 0:00:02
timeout sunrpc 0:10:00 h323 0:05:00 h225 1:00:00 mgcp 0:05:00 
policy-map global_policy
 class my_inspection_tcp
  set connection embryonic-conn-max 2048 per-client-max 1024\  
    per-client-embryonic-max 1024 random-sequence-number disable
  set connection timeout embryonic 0:02:00 tcp 1:30:00 dcd 24:00:00 5 
  set connection advanced-options verify-chksum
service-policy global_policy interface [WhateverIF]

