question about trimning data "len" conditions in TSO in tcp_output.c
Cui, Cheng
Cheng.Cui at netapp.com
Sun Apr 10 20:49:53 UTC 2016
Hi Hans,
I would continue this discussion with a different change. The piece of
change is
here and also I attached the patch "change.patch" against the FreeBSD HEAD
code-line.
diff --git a/sys/netinet/tcp_output.c b/sys/netinet/tcp_output.c
index 2043fc9..fa124f1 100644
--- a/sys/netinet/tcp_output.c
+++ b/sys/netinet/tcp_output.c
@@ -938,25 +938,16 @@ send:
* fractional unless the send sockbuf can be
* emptied:
*/
- max_len = (tp->t_maxseg - optlen);
- if ((off + len) < sbavail(&so->so_snd)) {
+ max_len = (tp->t_maxopd - optlen);
+ if (len > (max_len << 1)) {
moff = len % max_len;
if (moff != 0) {
len -= moff;
sendalot = 1;
}
}
-
- /*
- * In case there are too many small fragments
- * don't use TSO:
- */
- if (len <= max_len) {
- len = max_len;
- sendalot = 1;
- tso = 0;
- }
-
+ KASSERT(len > max_len,
+ ("[%s:%d]: len <= max_len", __func__,
__LINE__));
/*
* Send the FIN in a separate segment
* after the bulk sending is done.
I think this change could save additional loops that send single MSS-size
packets. So I think some CPU cycles can be saved as well, due to this
change
reduced software sends and pushed more data to offloading sends.
Here is my test. The iperf command I choose pushes 100Mbytes data to the
wire by setting the default TCP sendspace to 1MB and recvspace to 2MB. I
tested this TCP connection performance on a pair of 10Gbps FreeBSD 10.2
nodes
(s1 and r1) with a switch in between. Both nodes have TSO and delayed ACK
enabled.
root at s1:~ # ping -c 3 r1
PING r1-link1 (10.1.2.3): 56 data bytes
64 bytes from 10.1.2.3: icmp_seq=0 ttl=64 time=0.045 ms
64 bytes from 10.1.2.3: icmp_seq=1 ttl=64 time=0.037 ms
64 bytes from 10.1.2.3: icmp_seq=2 ttl=64 time=0.038 ms
--- r1-link1 ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.037/0.040/0.045/0.004 ms
1M snd buffer/2M rcv buffer
sysctl -w net.inet.tcp.hostcache.expire=1
sysctl -w net.inet.tcp.sendspace=1048576
sysctl -w net.inet.tcp.recvspace=2097152
iperf -s <== iperf command at receiver
iperf -c r1 -m -n 100M <== iperf command at sender
root at s1:~ # iperf -c r1 -m -n 100M
------------------------------------------------------------
Client connecting to r1, TCP port 5001
TCP window size: 1.00 MByte (default)
------------------------------------------------------------
[ 3] local 10.1.2.2 port 22491 connected with 10.1.2.3 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 0.3 sec 100 MBytes 2.69 Gbits/sec
[ 3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
root at r1:~ # iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 2.00 MByte (default)
------------------------------------------------------------
[ 4] local 10.1.2.3 port 5001 connected with 10.1.2.2 port 22491
[ ID] Interval Transfer Bandwidth
[ 4] 0.0- 0.3 sec 100 MBytes 2.62 Gbits/sec
Each test sent 100MBytes of data, and I collected the packet trace from
both
nodes by tcpdump. I did this test twice to confirm the result can be
reproduced.
>From the trace files of both nodes before my code change, I see a lot of
single-MSS size packets. See the attached trace files in
"before_change.zip".
For example, in a sender trace file I see 43480 single-MSS size
packets(tcp.len==1448) out of 57005 packets that contain data(tcp.len >
0).
That's 76.2%.
And I did the same iperf test and gathered trace files. I did not find
many single-MSS packets this time. See the attached trace files in
"after_change.zip". For example, in a sender trace file I see zero
single-MSS
size packets(tcp.len==1448) out of 35729 data packets(tcp.len > 0).
Compared with the receiver traces, I did not see significant more
fractional
packets received after change.
I also did tests using netperf, although I did not get enough 95%
confidence for
every test on snd/rcv buffer size. Attached are my netperf result on
different
snd/rcv buffer size before and after the change (netperf_before_change.txt
and
netperf_after_change.txt), which also look good.
used netperf command:
netperf -H s1 -t TCP_STREAM -C -c -l 400 -i 10,3 -I 95,10 -- -s
${LocalSndBuf} -S ${RemoteSndBuf}
Thanks,
--Cheng Cui
NetApp Scale Out Networking
-------------- next part --------------
A non-text attachment was scrubbed...
Name: change.patch
Type: application/octet-stream
Size: 864 bytes
Desc: change.patch
URL: <http://lists.freebsd.org/pipermail/svn-src-head/attachments/20160410/0b5bda11/attachment.obj>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: netperf_before_change.txt
URL: <http://lists.freebsd.org/pipermail/svn-src-head/attachments/20160410/0b5bda11/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: netperf_after_change.txt
URL: <http://lists.freebsd.org/pipermail/svn-src-head/attachments/20160410/0b5bda11/attachment-0001.txt>
More information about the svn-src-head
mailing list