kern/132832: tcp_output() might generate invalid TSO frames when
len > TCP_MAXWIN - hdrlen - optlen
Renaud Lienhart
renaud at vmware.com
Thu Mar 19 18:50:02 PDT 2009
>Number: 132832
>Category: kern
>Synopsis: tcp_output() might generate invalid TSO frames when len > TCP_MAXWIN - hdrlen - optlen
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Fri Mar 20 01:50:01 UTC 2009
>Closed-Date:
>Last-Modified:
>Originator: Renaud Lienhart
>Release: FreeBSD 7.1
>Organization:
VMware, Inc.
>Environment:
>Description:
The tcp_output() routine has an issue when the send window exceeds TCP_MAXWIN and the underlying interface supports TSO.
There is a check to ensure the data being pushed doesn't exceed the maximum TSO packet size. If this is the case, "len" is trimmed and "sendalot" is set:
--- 8< ---
if (tso) {
if (len > TCP_MAXWIN - hdrlen - optlen) {
len = TCP_MAXWIN - hdrlen - optlen;
len = len - (len % (tp->t_maxopd - optlen));
sendalot = 1;
--- >8 ---
Everything's hunky-dory, until the next "sendalot" iteration;
If the remaining window does not require TSO (i.e. len <= tp->t_maxseg), this following piece of code fails to properly reset "tso":
--- 8< ---
if (len > tp->t_maxseg) {
if (<tso_conds>)
tso = 1;
} else {
len = tp->t_maxseg;
sendalot = 1;
tso = 0;
}
}
--- >8 ---
"tso" remains set to 1 and the resulting packet is tagged with CSUM_TSO although it doesn't require TSO. This causes problems with a large number of nics which refuse to send the resulting frame and, in some case, wedge.
>How-To-Repeat:
Using netperf (or any TCP workload) with a large socket buffer size exposes the issue in a matter of seconds.
>Fix:
The solution is to always reset "tso" at the beginning of the "sendalot" loop in order to ensure it is not stale.
In my patch I also added a KASSERT() which directly catches any problematic frame before it reaches any other layer.
Patch attached with submission follows:
Index: netinet/tcp_output.c
===================================================================
--- netinet/tcp_output.c (revision 190117)
+++ netinet/tcp_output.c (working copy)
@@ -140,7 +140,7 @@
int idle, sendalot;
int sack_rxmit, sack_bytes_rxmt;
struct sackhole *p;
- int tso = 0;
+ int tso;
struct tcpopt to;
#if 0
int maxburst = TCP_MAXBURST;
@@ -198,6 +198,7 @@
SEQ_LT(tp->snd_nxt, tp->snd_max))
tcp_sack_adjust(tp);
sendalot = 0;
+ tso = 0;
off = tp->snd_nxt - tp->snd_una;
sendwin = min(tp->snd_wnd, tp->snd_cwnd);
sendwin = min(sendwin, tp->snd_bwnd);
@@ -477,7 +478,6 @@
} else {
len = tp->t_maxseg;
sendalot = 1;
- tso = 0;
}
}
if (sack_rxmit) {
@@ -996,6 +996,8 @@
* XXX: Fixme: This is currently not the case for IPv6.
*/
if (tso) {
+ KASSERT(len > tp->t_maxopd - optlen,
+ ("%s: len <= tso_segsz", __func__));
m->m_pkthdr.csum_flags = CSUM_TSO;
m->m_pkthdr.tso_segsz = tp->t_maxopd - optlen;
}
>Release-Note:
>Audit-Trail:
>Unformatted:
More information about the freebsd-bugs
mailing list