Issue with huge numbers of connections

M. Warner Losh imp at bsdimp.com
Sun Jun 17 19:03:03 UTC 2007


In message: <46757818.5030005 at joeholden.co.uk>
            Joe Holden <joe at joeholden.co.uk> writes:
: M. Warner Losh wrote:
: > Greetings,
: > 
: > I have a friend who is having problems with a service he's running.
: > He gets billions and billions of connections to this service a day.
: > Somewhere between 10^8 and 10^9 connections, he notices that his
: > servers lose the ability to accept new connections.  These are TCP
: > connections.
: > 
: > This is with FreeBSD 6.1R.  My first question is: does anybody know if
: > the fixes to -current/7.0 have fixed this?  Is there a fix that can be
: > back ported?  He's currently working around the problem by having a
: > number of different machines that reboot in a round robin fashion, but
: > would like a better solution.
: > 
: > Warner
: > _______________________________________________
: Warner, if he hasn't done so already, have you suggested tweaking the
: sysctl variables, such as:
: kern.maxfilesperproc
: kern.ipc.nmbclusters
: kern.maxprocperuid
: kern.maxfiles
: kern.ipc.somaxconn
: kern.maxvnodes
: 
: Tweaking those may help, or he may just be exhausting available
: resources, IIRC its limited to 65k connections per interface, someone
: correct me if I am wrong.

Here's the bug report I got:

	There is still a vague problem with the FreeBSD network interface --
	especially the part that handles TCP. Something strange happens after
	about a week or so (after handling about 10^8 or 10^9
	connections). The system becomes unreachable for TCP connections. I
	have fixed this problem by having all of the FreeBSD systems reboot
	automatically once a week using a cron job. I have not been able to
	isolate this issue, but I suspect that there is some kind of problem
	with the error handling and some resource gets depleted slowly. I
	realize that this is pretty vague, but I have not been able to find
	out what actually happens in this case.

I believe that each connection lasts on the order of tens or hundreds
milliseconds, given what I know about the systems in place.  My earlier
rephrase omitted a few key points.  I suggested that he try to use a newer
version of FreeBSD, but since these are a production system, he's hesitant to
mess with them...

Doing the math on 10^9 connections in a week translates to ~1650/s, so we'd
expect there are on the order of 100-200 connections steady state at any
time.  I suspect that the peak load may be up to 100 times that, which is
still only 20000 connections.  The hangs don't seem to hang at a peak, but
randomly.

Given all that, I'm not sure which of the above to try.

Warner




More information about the freebsd-net mailing list