tcp failing to recover from a packet loss under 8.2-RELEASE?

Slawa Olhovchenkov slw at zxy.spb.ru
Fri Aug 5 07:23:08 UTC 2011


On Fri, Aug 05, 2011 at 12:02:18AM +1000, Lawrence Stewart wrote:

> > Setting net.inet.tcp.reass.maxsegments=8148 and rerunning the
> > tests appears to result in a solid 14MB/s, its still running a
> > full soak test but looking very promising :)
> 
> This is exactly the necessary tuning required to drive high BDP links 
> successfully. The unfortunate problem with my reassembly change was that 
> by removing the global count of reassembly segments and using the uma 
> zone to enforce the restrictions on memory use, we wouldn't necessarily 
> have room for the last segment (particularly if a single flow has a BDP 
> larger than the max size of the reassembly queue - which is the case for 
> you and Slawa).
> 
> This is bad as Andre explained in his message, as we could stall 
> connections. I hadn't even considered the idea of allocating on the 
> stack as Andre has suggested in his patch, which I believe is an 
> appropriate solution to the the stalling problem assuming the function 
> will never return with the stack allocated tqe still in the reassembly 
> queue. My longer term goal is discussed below.
> 
> > So I suppose the question is should maxsegments be larger by
> > default due to the recent changes e.g.
> > - V_tcp_reass_maxseg = nmbclusters / 16;
> > + V_tcp_reass_maxseg = nmbclusters / 8;
> >
> > or is the correct fix something more involved?
> 
> I'm not sure if bumping the value is appropriate - we have always 
> expected users to tune their network stack to perform well when used in 
> "unusual" scenarios - a large BDP fibre path still being in the 
> "unusual" category.
> 
> The real fix which is somewhere down on my todo list is to make all 
> these memory constraints elastic and respond to VM pressure, thus 
> negating the need for a hard limit at all. This would solve many if not 
> most of the TCP tuning problems we currently have with one foul swoop 
> and would greatly reduce the need for tuning in many situations that 
> currently are in the "needs manual tuning" basket.

Autotunig w/o limits is bad idea. This is way to DoS.
May be solved this trouble by preallocation "hidden" element in tqe
for segment received in valid order and ready for send to application?
T.e. when creating reassembled queue for tcp connection we allocation
queue element (with room for data payload), used only when data ready
for application. Allocation in queue for not breaking ABI (don't
change struct tcpcb).

> Andre and Steven, I'm a bit too sleepy to properly review your combined 
> proposed changes right now and will follow up in the next few days instead.
> 
> Cheers,
> Lawrence


More information about the freebsd-net mailing list