[CFT] Early Retransmit for TCP (rfc5827) patch

Scheffenegger, Richard rs at netapp.com
Tue May 31 09:10:23 UTC 2011


Hi Weongyo,

Good to know that you are addressing the primary reason for
retransmission timeouts with SACK.

(Small window (early retransmit) is ~70%, lost retransmission ~25%,
end-of-stream loss ~5% of all addressable causes for a RTO).

I looked at your code to enable RFC5827 Early Retransmits.

There is one minor nit-pick: tcp_input is calling tcp_getrexmtthresh for
every duplicate ACK. When SACK is enabled (over 90% of all sessions
today), the byte-based tcp_sack_ownd routine cycles over the entire SACK
scoreboard.

As the scoreboard can become huge with fat, long pipes, this appears to
be suboptimal. 

Perhaps something along these lines:

ackedbyte = 0;
int mark = tp->snd_una;
TAILQ_FOREACH(p, &tp->snd_holes, scblink) {
  ackedbyte += p->start - mark;
  if (ackedbyte >= amout)
    return(TRUE);
  mark = p->end;
}
ackedbyte += tp->snd_fack - mark;
  if (ackedbyte >= amout)
    return(TRUE);
return(FALSE);

Would be more scalable (only a holes at the start need to be cycled,
increasing the chances that they stick close to the CPU)...

Perhaps adding a variable to track the number of bytes SACKed to the
scoreboard (and updated with the receipt of a new SACK block) would be
even more efficient....

Best regards,
  Richard Scheffenegger





From: weongyo at freebsd.org
Date: Sat May 7 00:19:38 UTC 2011

Hello all,

I'd like to send another patch to support RFC5827 in TCP stack which
could be found at:

	http://people.freebsd.org/~weongyo/patch_20110506_rfc5827.diff
<http://people.freebsd.org/%7Eweongyo/patch_20110506_rfc5827.diff> 

This patch supports all Early Retransmit logics (Byte-Based Early
Retransmit and Segment-Based Early Retransmit) when net.inet.tcp.rfc5827
sysctl knob is turned on.

Please note that Segment-Based Early Retransmit logic is separated using
khelp module because it adds additional operations and requires variable
spaces to track segment boundaries on the right side window.

So if the khelp module is loaded, it's a preference but if not the
default logic is `Byte-Based Early Retransmit'.

I implemented based on DragonflyBSD's implementation but it looked it's
not same with RFC specification what I thought so I changed most of
parts.  In my test environments it looks it's working correctly.

Please review and test my work and tell me if you have any concerns and
questions.

regards,
Weongyo Jeong

-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch_20110506_rfc5827.diff
Type: text/x-diff
Size: 18455 bytes
Desc: not available
Url :
http://lists.freebsd.org/pipermail/freebsd-net/attachments/20110507/90f2
f164/patch_20110506_rfc5827.bin



More information about the freebsd-net mailing list