kern/127928: The TCP bandwidth gets squeezed every time
tcp_xmit_bandwidth_limit() kicks in
Renaud Lienhart
renaud at vmware.com
Tue Oct 7 18:20:01 UTC 2008
>Number: 127928
>Category: kern
>Synopsis: The TCP bandwidth gets squeezed every time tcp_xmit_bandwidth_limit() kicks in
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Tue Oct 07 18:20:01 UTC 2008
>Closed-Date:
>Last-Modified:
>Originator: Renaud Lienhart
>Release: FreeBSD 6.2+
>Organization:
VMware, Inc.
>Environment:
ESX 3.x
>Description:
FreeBSD 6,7 & 8 have a bug in their tcp_xmit_bandwidth_limit() function.
The problem is that the bandwidth calculation is a 1/16th weigthed average and that the saved bandwidth gets reset every time the RTT threshold is crossed over. This means that the bandwidth (and thus the B-D product, and thus the sending window) is briefly squeezed whenever the mechanism kicks in and it takes
a bit of time to reach sane values again.
The threshold by default is 10ms. With HZ=100, this means that a RTT of 0 tick disables the mechanism and a RTT of 1 tick activates it. Because of the poor tick granularity, the RTT is of 1 tick every 10ms and the mechanism kicks in, brutally squeezing the sending window because tp->snd_bandwidth is 0.
The problem also appears with HZ=1000, when the RTT fluctuates in the bad spot of ~10ms.
>How-To-Repeat:
Run a kernel with HZ=100 (or HZ=1000 and a latency around ~10ms, which is harder) and any TCP load with tcp_inflight_debug = 1. Notice the bandwidth reported by the log is inconsistent with the link capacity and that the tp->snd_bwnd is incorrectly low.
>Fix:
To fix this, kickstart tp->snd_bandwidth without the weighted average smoothing when it is 0.
Patch attached with submission follows:
Index: netinet/tcp_subr.c
===================================================================
--- netinet/tcp_subr.c (revision 183668)
+++ netinet/tcp_subr.c (working copy)
@@ -1783,7 +1783,9 @@
tp->t_bw_rtseq = ack_seq;
if (tp->t_bw_rtttime == 0 || (int)bw < 0)
return;
- bw = ((int64_t)tp->snd_bandwidth * 15 + bw) >> 4;
+ if (tp->snd_bandwidth != 0) {
+ bw = ((int64_t)tp->snd_bandwidth * 15 + bw) >> 4;
+ }
tp->snd_bandwidth = bw;
>Release-Note:
>Audit-Trail:
>Unformatted:
More information about the freebsd-bugs
mailing list