connect() returns EADDRINUSE during massive host->host conn rate

Jan Srzednicki w at
Tue Nov 27 05:53:22 PST 2007


I have a pair of hosts. One of them performs a massive amount of
TCP connections to the other one, all to the same port. This setup
mostly works fine, but from time to time (that varies, from once a
minute to one a half an hour), the connect(2) syscall fails with 
EADDRINUSE. The connection rate tops to 50 connection

The socket is non-blocking. It does standard job of creating the socket,
setting up the relevant fields, setting SO_REUSEADDR and SO_KEEPALIVE,
setting O_NONBLOCK on the descriptor. No bind(2) is performed. The
connection is initiated from inside a jail (not sure if that implies a
internal bind(2) to the jail's address). There are no connections from
the other host to the first one.

I've tried tuning the net.inet.ip.portrange variables: I've increased
the available portrange to over 45000 ports (quite a lot, should be more
than enough for just anything) and I've toggled
net.inet.ip.portrange.randomized off, but that didn't change anything.

The workaround on the application side - retrying on EADDRINUSE - works
pretty well, but hey, from what I know from the Stevens book, that
shouldn't be happening, though Google said all BSD had a bad habit of
throwing out EADDRINUSE from time to time.

This all happens on a 6.2-RELEASE system. The symptoms are easily
reproducable in my environment.

Is there any known fix for that? If there ain't, can it be fixed? :)

  Jan Srzednicki  ::
  "Remember, remember, the fifth of November"
                                     -- V for Vendetta

More information about the freebsd-stable mailing list