scp more perfectly fills the pipe than NFS/TCP
Matthew Dillon
dillon at apollo.backplane.com
Mon Dec 21 21:39:31 UTC 2009
I'm just covering all the bases. To be frank, half the time when
someone posts they are doing something a certain way it turns out that
they actually aren't. I've learned that covering the bases tends to
lead to solutions more quickly than assuming a perfect rendition.
For example, is that 10ms latency with a ping? What about a
ping -s 4000? If you are talking about 16KB RCP transactions over
TCP then the real question is what is the latency for 16KB of data
coming back along the wire?
In your case we can calculate the read-ahead needed to keep the pipe
full. 500 KBytes/sec divided by 16KB is 31 transactions per second,
or an effective latency of 32ms + probably 5-10 for the RPC to be
sent... so probably more around 40ms. Not 10ms. And if you are using
32KB transactions the latency is going to be more around 70ms.
500K x 40ms = is about 20KB, so theoretically a read-ahead of
2 packets should do the trick.
There's a catch, however. Depending on the client-side implementation
the read-ahead requests may be transmitted out of order. That is
if the cp or dd program wants to read blocks 0, 1, 2, 3, 4, the
actual RPC's sent over the wire might be sent like this: 0, 2, 1, 4, 3,
or even 0, 4, 1, 2, 3. Someone who know what work was done on the
FreeBSD NFS stack can tell you whether that is still the case. If
the nfsiod's (whether kernel threads or not) are separate synchronous
RPCs then the read-ahead can transmit the RPC requests out of order.
The server may also respond to them out of order... (typically there
being 4 server-side threads handling RPCs). The combination is deadly.
If the read-aheads transmit out of order what happens is that
cp/dd/whatever on the client winds up stalling waiting for the
next linear block to come back, which might be BEHIND a later
read-ahead block coming back down the wire. That is, the stall,
the RPC latency winds up being multiplied by N. A 40ms turn can
turn into an 80 or 120ms turn before the cp/dd/whatever unstalls.
To deal with this you want to set the read-ahead higher... probably at
least 3 or four RPCs.
As I said, there are other issues as the amount of read-ahead
increases. The only way to really figure out what is going on is
to tcpdump the link and determine why the pipeline is not being
maintained. Look for out of order requests, out of order responses,
and stalls (actual lost packets).
Actual lost packets are not likely in your case, assuming you are
using something like fair-share scheduling and not RED (RED should
only be used by routers in the middle of a large network, it should
never be used at the end-points).
-Matt
More information about the freebsd-hackers
mailing list