repeatable scp stalls from 7.0 to 7.0
Chris Buechler
freebsd at chrisbuechler.com
Sun Aug 17 19:15:20 UTC 2008
I've been seeing pretty frequent and repeatable scp stalls between two
FreeBSD 7.0 servers (7.0-RELEASE-p2 to be exact) on a 100 Mb LAN.
They're two HP servers, an Opteron 275 and a dual Xeon 3.4 (don't recall
the models but I can get them if it's relevant) using the onboard bge(4)
cards. The client side (builder7) SCPs a file to the server side
(hosting7) about 20 times a day. The stall happens about 2-4 times a
week or so, and has happened ever since we put these two boxes online in
their current functions. Initially they were the original 7.0 release,
prior to the TCP fix in June. It's behaved the same way both prior to
and after that fix. There are no apparent network issues aside from this
with either of the boxes.
Since we had nothing to go on other than scp sessions going to "stalled"
(no relevant logs), I setup a tcpdump on each end filtering on the TCP
22 traffic between these hosts, grabbing 100 bytes of each frame to
avoid chewing up too much disk space. When it happened again I split the
end out into its own file with editcap, 4.2-4.3 MB each.
http://chrisbuechler.com/temp/lastcut-hosting7.pcap
<http://chrisbuechler.com/temp/lastcut-hosting7.pcap> - server end,
capture taken on host but destination IP is a jail
http://chrisbuechler.com/temp/lastcut-builder7.pcap
<http://chrisbuechler.com/temp/lastcut-builder7.pcap> - client end,
connection is initiated from the host, no jails involved.
The TCP window on the ACKs from server to client start decrementing [1],
to the point where it's down to a window of 0. From that point,
everything the server (172.29.29.181 <http://172.29.29.181>) sends back
to the client (172.29.29.170 <http://172.29.29.170>) has a window of 0.
Restarting the scp makes it work again. It doesn't happen every time,
somewhere around 2-3% of the time it does. I don't see any cause for the
decrementing window in those captures but maybe I'm missing something.
1 - lastcut-hosting7.pcap frame #21298; lastcut-builder7.pcap #25088
These are both very stock boxes, GENERIC kernels, no significant changes
in sysctl or anything else. I'm not sure where to go from here, any
assistance in resolving this would be appreciated.
cheers,
Chris
More information about the freebsd-net
mailing list