Updating our TCP and socket sysctl values...

Mon Apr 11 20:52:13 UTC 2011

    This is a little late but I should note that FreeBSD had the inflight
    limiting code (which I wrote long ago) which could be turned on with a
    simple sysctl.  It was replaced in September 2010 with the 'pluggable
    congestion control algorithm' which I am unfamiliar with, but which I
    assume has some equivalent replacement for the functionality.

    Also I believe FreeBSD turns on autosndbuf and autorcvbuf by default
    now, which means the buffers ARE being made larger than the defaults
    (assuming that tcp window shift is not disabled, otherwise you are
    limited to 65535 bytes no matter how big your buffers are).

    In anycase, the inflight/congestion-control algorithms essentially
    remove the buffer bloat issue for tcp transmit buffers.  It doesn't
    matter HOW big your transmit buffer and the receive side's receive
    buffer is.  People running servers need to turn on the inflight or
    equivalent limiter for several reasons:

    * BW x DELAY products are all over the map these days, creating
      surges if packet backlog is not controlled.  Reasons for this
      vary but a good chunk can be blamed on ISPs and network providers
      who implement burst/backoff schemes (COMCAST is a good example,
      where your downlink bandwidth is cut in half after around 10
      seconds at full bore).

    * Default tcp buffer sizes are too small.

    * Auto-sized tcp buffers are often too large (from a buffer bloat
      perspective).

    * Edge routers and other routers in the infrastructure have huge
      amounts of buffer memory these days, so drops don't really start
      to happen until AFTER the network has become almost unusable.

    * Turning it on significantly reduces the packet backlog at choke
      points in the network (usually the edge router), and significantly
      improves the ability for fair-share and QOS algorithms on the border
      router to manage traffic.

      That is, nearly all border routers are going to have some form of QOS,
      but there is a world of difference having to implement those algorithms
      for 500 simultanious connections with 3 packets of backlog per connection
      verses having to do it for 50 packets of backlog per connection.

    * You don't want to run RED on a border router, RED is designed for
      the middle of large switching networks.  At the edges there are lots
      of other choices that do not require dropping packets randomly.

      Plus TCP SACK (which most sites now implement) tends to defeat RED
      these days, so when RED is used at all it the phrase 'random' becomes
      equivalent to 'frustration'.

    * Even if your SERVER has tons of bandwidth, probably a good number of
      the poor CLIENTS on the other end of the connection do not.  If you
      don't control the backlog guess where all that backlog ends up at?

      Thats right, it ends up on the client-side border routers... for
      example, it ends up at the DSLAM if the client is a DSL user.  These
      edge routers are the very last place where you ever want packet backlog
      to accumulate.  They don't handle it well.

    So, basically that means some sort of transmit-side congestion control
    needs to be turned on... and frankly should be turned on by default.  It
    just isn't optional any more.

    I've been screwing around with this stuff for a long time.  I have
    colocated boxes with tons of bw, and servers running out of the house
    that don't (though ganging the uplink for U-Verse and COMCAST together
    is actually quite nice).  I've played with the issue from both sides
    of the coin.  Clients are a lot happier when servers don't build up
    50+ packets of backlog from a single connection (let alone serveral),
    and routers can manage QOS better when the nearby servers don't
    blast THEM full of packets.

    As an example of this, pulling video TCP on my downlink from several
    concurrent sources over the native COMCAST or U-Verse links tended to
    create problems when those video streams were from well endowed servers.

    Eventually the only real solution was to run a VPN to a fast colocated
    box and run FAIRQ/PF on both sides to control the bandwidth in both
    directions, simply because there are too many video servers out there
    which do essentially no bandwidth management at all. Examining the
    queue showed multiple (video) servers out on the internet would happily
    build up 50+ packets per connection.  Telco and cable providers that
    most clients are connected to just can't handle it without something
    blowing up.

						-Matt