svn commit: r295136 - in head: sys/kern sys/netinet sys/sys usr.bin/netstat

Tue Feb 2 21:21:54 UTC 2016

On 2/2/16 1:09 PM, Slawa Olhovchenkov wrote:
> On Tue, Feb 02, 2016 at 12:35:47PM -0800, Alfred Perlstein wrote:
>
>>> I would second John's comment on the necessity of the change though,
>>> if one already have 32K of *backlogged* connections, it's probably not
>>> very useful to allow more coming in.  It sounds like the application
>>> itself is seriously broken, and unless expanding the field have some
>>> performance benefit, I don't think it should stay.
>> Imagine a hugely busy image board like 2ch.net, if there is a single
>> hiccup, it's very possible to start dropping connections.
> In reality start dropping connections in any case: nobody will be
> infinity wait of accept (user close browser and go away, etc).
>
> Also, if you have more then 4K backloged connections -- you have
> problem, you can't process all connections request and in next second
> you will be have 8K, after next second -- 12K and etc.
>
Thank you Slawa,

I am pretty familiar with what you are describing which are "cascade 
failures", however in order to understand why such a change makes sense 
I can give you a little early history lesson on a project I developed 
under FreeBSD, and then explain why such a project would probably not 
work with FreeBSD as a platform today (we would have to use Linux or 
custom patches).

Here is that use case:

Back in 1999 I wrote a custom webserver using FreeBSD that was 
processing over 1500 connections per second.

What we were doing was tracking web hits using "hidden gifs".  Now this 
was 1999 with only 100mbit hardware and a pentium 400mhz.  Mind you I 
was doing this with cpu to spare, so having an influx of additional hits 
was OK.

Meaning I could easily deal with backlog.

Now what was important about this case was that EVERY time we served the 
data we were able to monitize it and pay for my salary at the time which 
was working on SMP for FreeBSD and a bunch of other patches.  Any lost 
hits / broken connections would easily cost us money, which in turn 
meant less time on FreeBSD and less time fixing things to scale.

In our case the user would not really know if our "page" didn't load 
because we were just an invisible gif.

So back to the example, let's scale that out to today's numbers.

100mbps -> 10gigE, so that would be 1500 conn/sec -> 150,000 conn/sec.  
so basically at 0.20 of a second of any sort of latency I will be 
overflowing the listen queue and dropping connections.

Now when you still have CPU to spare because connections *are* precious, 
then the model makes sense to slightly over-provision the servers to 
allow for somebacklog to be processed.

So, in today's day and age, it really does make sense to allow for 
buffering more than 32k connections, particularly if the developer knows 
what he is doing.

Does this help explain the reasoning?

thanks!

-Alfred