Updating our TCP and socket sysctl values...
Matthew Dillon
dillon at apollo.backplane.com
Mon Apr 11 20:52:13 UTC 2011
This is a little late but I should note that FreeBSD had the inflight
limiting code (which I wrote long ago) which could be turned on with a
simple sysctl. It was replaced in September 2010 with the 'pluggable
congestion control algorithm' which I am unfamiliar with, but which I
assume has some equivalent replacement for the functionality.
Also I believe FreeBSD turns on autosndbuf and autorcvbuf by default
now, which means the buffers ARE being made larger than the defaults
(assuming that tcp window shift is not disabled, otherwise you are
limited to 65535 bytes no matter how big your buffers are).
In anycase, the inflight/congestion-control algorithms essentially
remove the buffer bloat issue for tcp transmit buffers. It doesn't
matter HOW big your transmit buffer and the receive side's receive
buffer is. People running servers need to turn on the inflight or
equivalent limiter for several reasons:
* BW x DELAY products are all over the map these days, creating
surges if packet backlog is not controlled. Reasons for this
vary but a good chunk can be blamed on ISPs and network providers
who implement burst/backoff schemes (COMCAST is a good example,
where your downlink bandwidth is cut in half after around 10
seconds at full bore).
* Default tcp buffer sizes are too small.
* Auto-sized tcp buffers are often too large (from a buffer bloat
perspective).
* Edge routers and other routers in the infrastructure have huge
amounts of buffer memory these days, so drops don't really start
to happen until AFTER the network has become almost unusable.
* Turning it on significantly reduces the packet backlog at choke
points in the network (usually the edge router), and significantly
improves the ability for fair-share and QOS algorithms on the border
router to manage traffic.
That is, nearly all border routers are going to have some form of QOS,
but there is a world of difference having to implement those algorithms
for 500 simultanious connections with 3 packets of backlog per connection
verses having to do it for 50 packets of backlog per connection.
* You don't want to run RED on a border router, RED is designed for
the middle of large switching networks. At the edges there are lots
of other choices that do not require dropping packets randomly.
Plus TCP SACK (which most sites now implement) tends to defeat RED
these days, so when RED is used at all it the phrase 'random' becomes
equivalent to 'frustration'.
* Even if your SERVER has tons of bandwidth, probably a good number of
the poor CLIENTS on the other end of the connection do not. If you
don't control the backlog guess where all that backlog ends up at?
Thats right, it ends up on the client-side border routers... for
example, it ends up at the DSLAM if the client is a DSL user. These
edge routers are the very last place where you ever want packet backlog
to accumulate. They don't handle it well.
So, basically that means some sort of transmit-side congestion control
needs to be turned on... and frankly should be turned on by default. It
just isn't optional any more.
I've been screwing around with this stuff for a long time. I have
colocated boxes with tons of bw, and servers running out of the house
that don't (though ganging the uplink for U-Verse and COMCAST together
is actually quite nice). I've played with the issue from both sides
of the coin. Clients are a lot happier when servers don't build up
50+ packets of backlog from a single connection (let alone serveral),
and routers can manage QOS better when the nearby servers don't
blast THEM full of packets.
As an example of this, pulling video TCP on my downlink from several
concurrent sources over the native COMCAST or U-Verse links tended to
create problems when those video streams were from well endowed servers.
Eventually the only real solution was to run a VPN to a fast colocated
box and run FAIRQ/PF on both sides to control the bandwidth in both
directions, simply because there are too many video servers out there
which do essentially no bandwidth management at all. Examining the
queue showed multiple (video) servers out on the internet would happily
build up 50+ packets per connection. Telco and cable providers that
most clients are connected to just can't handle it without something
blowing up.
-Matt
More information about the freebsd-arch
mailing list