Issue with huge numbers of connections

Maxim Konovalov maxim at macomnet.ru
Sun Jun 17 19:29:14 UTC 2007


On Sun, 17 Jun 2007, 13:02-0600, M. Warner Losh wrote:

> In message: <46757818.5030005 at joeholden.co.uk>
>             Joe Holden <joe at joeholden.co.uk> writes:
> : M. Warner Losh wrote:
> : > Greetings,
> : >
> : > I have a friend who is having problems with a service he's running.
> : > He gets billions and billions of connections to this service a day.
> : > Somewhere between 10^8 and 10^9 connections, he notices that his
> : > servers lose the ability to accept new connections.  These are TCP
> : > connections.
> : >
> : > This is with FreeBSD 6.1R.  My first question is: does anybody know if
> : > the fixes to -current/7.0 have fixed this?  Is there a fix that can be
> : > back ported?  He's currently working around the problem by having a
> : > number of different machines that reboot in a round robin fashion, but
> : > would like a better solution.
> : >
> : > Warner
> : > _______________________________________________
> : Warner, if he hasn't done so already, have you suggested tweaking the
> : sysctl variables, such as:
> : kern.maxfilesperproc
> : kern.ipc.nmbclusters
> : kern.maxprocperuid
> : kern.maxfiles
> : kern.ipc.somaxconn
> : kern.maxvnodes
> :
> : Tweaking those may help, or he may just be exhausting available
> : resources, IIRC its limited to 65k connections per interface, someone
> : correct me if I am wrong.
>
> Here's the bug report I got:
>
> 	There is still a vague problem with the FreeBSD network interface --
> 	especially the part that handles TCP. Something strange happens after
> 	about a week or so (after handling about 10^8 or 10^9
> 	connections). The system becomes unreachable for TCP connections. I
> 	have fixed this problem by having all of the FreeBSD systems reboot
> 	automatically once a week using a cron job. I have not been able to
> 	isolate this issue, but I suspect that there is some kind of problem
> 	with the error handling and some resource gets depleted slowly. I
> 	realize that this is pretty vague, but I have not been able to find
> 	out what actually happens in this case.
>
> I believe that each connection lasts on the order of tens or
> hundreds milliseconds, given what I know about the systems in place.
> My earlier rephrase omitted a few key points.  I suggested that he
> try to use a newer version of FreeBSD, but since these are a
> production system, he's hesitant to mess with them...
>
> Doing the math on 10^9 connections in a week translates to ~1650/s,
> so we'd expect there are on the order of 100-200 connections steady
> state at any time.  I suspect that the peak load may be up to 100
> times that, which is still only 20000 connections.  The hangs don't
> seem to hang at a peak, but randomly.
>
> Given all that, I'm not sure which of the above to try.
>
There are several obvious sysctls can affect:
net.inet.ip.portrange.randomized, net.inet.ip.portrange.*.

We definitly need more debug info: vmstat -zm, netstat -anp tcp,
netstat -m, sysctl net.inet from his system.  It would be nice if he
gives a shell to the problem box.

-- 
Maxim Konovalov


More information about the freebsd-net mailing list