repeatable scp stalls from 7.0 to 7.0

Chris Buechler freebsd at chrisbuechler.com
Sun Aug 17 19:15:20 UTC 2008


I've been seeing pretty frequent and repeatable scp stalls between two 
FreeBSD 7.0 servers (7.0-RELEASE-p2 to be exact) on a 100 Mb LAN. 
They're two HP servers, an Opteron 275 and a dual Xeon 3.4 (don't recall 
the models but I can get them if it's relevant) using the onboard bge(4) 
cards. The client side (builder7) SCPs a file to the server side 
(hosting7) about 20 times a day. The stall happens about 2-4 times a 
week or so, and has happened ever since we put these two boxes online in 
their current functions. Initially they were the original 7.0 release, 
prior to the TCP fix in June. It's behaved the same way both prior to 
and after that fix. There are no apparent network issues aside from this 
with either of the boxes.

Since we had nothing to go on other than scp sessions going to "stalled" 
(no relevant logs), I setup a tcpdump on each end filtering on the TCP 
22 traffic between these hosts, grabbing 100 bytes of each frame to 
avoid chewing up too much disk space. When it happened again I split the 
end out into its own file with editcap, 4.2-4.3 MB each.

http://chrisbuechler.com/temp/lastcut-hosting7.pcap 
<http://chrisbuechler.com/temp/lastcut-hosting7.pcap> - server end, 
capture taken on host but destination IP is a jail
http://chrisbuechler.com/temp/lastcut-builder7.pcap 
<http://chrisbuechler.com/temp/lastcut-builder7.pcap> - client end, 
connection is initiated from the host, no jails involved.

The TCP window on the ACKs from server to client start decrementing [1], 
to the point where it's down to a window of 0. From that point, 
everything the server (172.29.29.181 <http://172.29.29.181>) sends back 
to the client (172.29.29.170 <http://172.29.29.170>) has a window of 0. 
Restarting the scp makes it work again. It doesn't happen every time, 
somewhere around 2-3% of the time it does. I don't see any cause for the 
decrementing window in those captures but maybe I'm missing something.

1 - lastcut-hosting7.pcap frame #21298; lastcut-builder7.pcap #25088

These are both very stock boxes, GENERIC kernels, no significant changes 
in sysctl or anything else. I'm not sure where to go from here, any 
assistance in resolving this would be appreciated.

cheers,
Chris



More information about the freebsd-net mailing list