bin/62055: scp(1) incorrectly reports "stalled" on slow copies

Wed Jan 28 21:10:17 PST 2004

>Number:         62055
>Category:       bin
>Synopsis:       scp(1) incorrectly reports "stalled" on slow copies
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Jan 28 21:10:14 PST 2004
>Closed-Date:
>Last-Modified:
>Originator:     Peter Jeremy
>Release:        FreeBSD 4.9-RELEASE-p1 i386
>Organization:
Alcatel Australia Limited
>Environment:
System: FreeBSD gsmx07.alcatel.com.au 4.9-RELEASE-p1 FreeBSD 4.9-RELEASE-p1 #0: Fri Dec 19 07:22:50 EST 2003 root at aalp03.alcatel.com.au:/mnt/obj/mnt/rpc/src/sys/gsmx i386

>Description:
	By default, scp(1) will provide a progress meter showing the
	transfer ETA.  If the link is slow, the transfer meter will
	alternate between displaying "- stalled -" and unrealistically
	short ETAs even though the actual connection is transferring
	data smoothly (as shown by tcpdump).

	By default, the progress meter is updated every second.  If
	there has been no apparent progress in the transfer after 5
	seconds, the progress meter will report "stalled" until some
	progress is reported.  There appear to be two issues that
	will result in long delays between output progress being seen
	by the progress meter.

	Firstly, output from the scp process is in filesystem blocksize
	blocks - the number of bytes transferred (used by the progres
	meter) will only be incremented when a full block of data has
	been transferred.  Therefore if the transfer rate is less than
	1.6KB/sec (old 8K filesystem) or 3.2KB/sec (newer 16KB filesystem)
	then the link will report as "stalled".  (Identified by code
	inspection).

	Secondly, the ssh process spawned by the scp process to perform
	the actual encryption and transfer includes a substantial
	internal buffer (>64KB) and appears to implement hysteresis.
	ktrace output of a sample transfer shows a peak of over 96KB
	buffered - at which point the ssh process stops reading until
	the buffer drops to about 32KB.  This implies that there is
	approximately 64KB hysteresis and a transfer rate below about
	13KB/sec can result in "stalled" reports.

	This behaviour is undesirable and appears to be a recent change.

>How-To-Repeat:

	Use ipfw/dummynet or similar to reduce the outgoing ssh
	bandwidth to a second system and transfer a file:

# ipfw pipe 20 config queue 10 bw 80000
# ipfw add 1005 pipe 20 tcp from any to 192.168.164.18 22
# dd if=/dev/urandom of=data count=512
# scp data 192.168.164.18:/tmp

>Fix:

	Unclear - the buffering in both scp and ssh as well as the
	hysteresis in ssh are beneficial to maximize transfer bandwidth
	and minimise context switching.  It would not be desirable to
	reduce these sizes when ssh is used across a LAN.

	In the case of scp, changing from atomicio(write, ...) to
	write(...) would remove the requirement to write at least
	filesystem_blocksize bytes/sec to the remote system.

	In the case of ssh, the hysteresis needs to be adjusted based
	on the outgoing bandwidth - this could possibly be done by
	resetting the "don't read more" flag after (say) 1 second.
>Release-Note:
>Audit-Trail:
>Unformatted: