FreeBSD 7.0 / Recv-Q full ? / win 0 ?

Andreas Carbin andreas.carbin at run.se
Sun Nov 23 18:00:50 PST 2008


Hello all,

I have the following issue with my (quite newly installed) FreeBSD 7.0 machines:

(I use "FreeBSD 7.0-RELEASE-p5 #0: Wed Oct  1 07:51:58 UTC 2008" on Dell PowerEdge 2970.)

When I copy large files with SCP from one host to another the destination host's recieve queue seems to fill up after a random number of seconds (10 - 300) with about 89.000 bytes, and the destination host sends Window Size = 0 to the sender. This means no data is transferred and the connection has "locked up" in some way (true?). 

This almost always happens when I copy a file from one host to another where there is a WAN connection between them. I have checked firewall rules - these are open to almost any traffic. (I have seen it happen between two locally connected machines also.) When copying with SCP starts, it runs perfectly at about 10 megabyte/s (100Mbit/s WAN network). A 3 GB file may succeed <5%. Error occurrs in about 10 to 300 seconds - then all payload data traffic stops. The TCP connection is still open. 

My guess was that maybe we get errors when copying this fast "close to thoeretical limit", so I used "scp -l <num>" where I specified <num> as 50 and 5 Mbit/s. This reduces speed perfectly, but gives me the same errors as in full speed. 

I have also tried (with no good results):

* net.inet.tcp.rfc1323 (on and off)
* net.inet.tcp.tso (on and off)
* RCXSUM and TXCSUM on and off
* change from on-board bce0 / Broadcom NetXtreme II BCM5708 1000Base-T to em0 / Intel(R) PRO/1000 Network Connection Version - 6.7.3
* setting net.inet.tcp.recvbuf_max: 16777216
* setting net.inet.tcp.sendbuf_max: 16777216

One really strange thing is that I can make the copy continue (!) with full data transfer if I truss the ssh process on the destination machine. So if I truss with output to /dev/null in the background all the copy completes (!!!!).

This is a tcpdump on destination host of SCP's TCP connection when no data is transferred: 

15:56:17.798079 IP sender_host.51296 > destination_host.ssh: . 8:9(1) ack 1 win 33304 <nop,nop,timestamp 1435178754 1291017157>
15:56:17.897407 IP destination_host.ssh > sender_host.51296: . ack 9 win 0 <nop,nop,timestamp 1291022157 1435178754>
15:56:22.797808 IP sender_host.51296 > destination_host.ssh: . 9:10(1) ack 1 win 33304 <nop,nop,timestamp 1435183754 1291022157>
15:56:22.897457 IP destination_host.ssh > sender_host.51296: . ack 10 win 0 <nop,nop,timestamp 1291027157 1435183754>
15:56:27.797913 IP sender_host.51296 > destination_host.ssh: . 10:11(1) ack 1 win 33304 <nop,nop,timestamp 1435188754 1291027157>
15:56:27.897508 IP destination_host.ssh > sender_host.51296: . ack 11 win 0 <nop,nop,timestamp 1291032157 1435188754>
15:56:32.798016 IP sender_host.51296 > destination_host.ssh: . 11:12(1) ack 1 win 33304 <nop,nop,timestamp 1435193754 1291032157>
15:56:32.897559 IP destination_host.ssh > sender_host.51296: . ack 12 win 0 <nop,nop,timestamp 1291037157 1435193754>
15:56:37.798119 IP sender_host.51296 > destination_host.ssh: . 12:13(1) ack 1 win 33304 <nop,nop,timestamp 1435198754 1291037157>
15:56:37.897610 IP destination_host.ssh > sender_host.51296: . ack 13 win 0 <nop,nop,timestamp 1291042157 1435198754>

Does enyone have an idea what this might be?

The error occurs when the receiving host is a FreeBSD 7.0 host (the sender can be 7.0 or 6.2 accoriding to my tests). 

Thank you,

//Andreas 

-------------------------------------------------------
Andreas Carbin
RUN Communications AB 
http://www.run.se 
E-mail: andreas.carbin at run.se 
-------------------------------------------------------


More information about the freebsd-net mailing list