[PATCH] Add a new TCP_IGNOREIDLE socket option

Wed Jan 30 17:01:06 UTC 2013

On Tuesday, January 29, 2013 6:07:22 pm Andre Oppermann wrote:
> On 29.01.2013 19:50, John Baldwin wrote:
> > On Thursday, January 24, 2013 11:14:40 am John Baldwin wrote:
> >>>> Agree, per-socket option could be useful than global sysctls under
> >>>> certain situation.  However, in addition to the per-socket option,
> >>>> could global sysctl nodes to disable idle_restart/idle_cwv help too?
> >>>
> >>> No.  This is far too dangerous once it makes it into some tuning guide.
> >>> The threat of congestion breakdown is real.  The Internet, or any packet
> >>> network, can only survive in the long term if almost all follow the rules
> >>> and self-constrain to remain fair to the others.  What would happen if
> >>> nobody would respect the traffic lights anymore?
> >>
> >> The problem with this argument is Linux has already had this as a tunable
> >> option for years and the Internet hasn't melted as a result.
> >>
> >>> Since this seems to be a burning issue I'll come up with a patch in the
> >>> next days to add a decaying restartCWND that'll be fair and allow a very
> >>> quick ramp up if no loss occurs.
> >>
> >> I think this could be useful.  OTOH, I still think the TCP_IGNOREIDLE option
> >> is useful both with and without a decaying restartCWND?
> >
> > *ping*
> >
> > Andre, do you object to adding the new socket option?
> 
> Yes, unfortunately I do object.  This option, combined with the inflated
> CWND at the end of a burst, effectively removes much, if not all, of the
> congestion control mechanisms originally put in place to allow multiple
> [TCP] streams co-exist on the same pipe.  Not having any decay or timeout
> makes it even worse by doing this burst after an arbitrary amount of time
> when network conditions and the congestion situation have certainly changed.

You have completely ignored the fact that Linux has had this as a global
option for years and the Internet has not melted.  A socket option is far more
fine-grained than their tunable (and requires code changes, not something a
random sysadmin can just toggle as "tuning").

> The primary principle of TCP is be cooperative with competing streams and
> fairly share bandwidth on a given link.  Whenever the ACK clock came to a
> halt for some time we must re-probe (slowstart from a restartCWND) the link
> to compensate for our lack of knowledge of the current link and congestion
> situation.  Doing that with a decay function and floor equaling the IW (10
> segments nowadays) gives a rapid ramp up especially on LAN RTTs while avoiding
> a blind burst and subsequent loss cycle.

I understand all that, but it isn't applicable to my use case.  I'm not sharing
the bandwidth with anyone but other connections of my own (and they are all
lower priority than this one).  Also, I have idle periods of hundreds of
milliseconds (large than an RTT on this cross-continental link that also has
high bandwidth), so it seems that even a decayed restartCWND will be useless to
me as it will have decayed down to nothing before I finally restart after long
idle periods.

> If you absolutely know that you're the only one on that network and you want
> pure wirespeed then a TCP cc_null module doing away with all congestion control
> may be the right answer.  The infrastructure is in place and it can be selected
> per socket.  Plus it can be loaded as a module and thus doesn't have to be part
> of the base system.

No, I do not think that doing away with all congestion control will work for
my case.  Even though we have a dedicated line, etc. that doesn't mean
congestion is impossible and that I don't want the "normal" feedback to apply
during the non-restart cases.  BTW, I looked at using alternate congestion
control algorithms (cc_cubic and some of the others) first before resorting to
adding this option and they either did not fix the issue or were buggy.

-- 
John Baldwin