Slow disk write speeds over network
Terry Lambert
tlambert2 at mindspring.com
Wed Jun 11 00:47:54 PDT 2003
Jason Stone wrote:
> > You haven't said if you were using UDP or TCP for the mounts; you
> > should definitely use TCP with FreeBSD NFS servers; it's also just
> > generally a good idea, since UDP frags act as a fixed non-sliding
> > window: NFS over UDP sucks.
>
> Huh. I thought that the conventional wisdom was that on a local network
> with no packet loss (and therefore no re-transmission penalties), udp was
> way faster because the overhead was so much less.
>
> Sorry if this seems like a pretty basic question, but can you explain
> this?
Sure:
1) There is no such thing as no packet loss.
2) The UDP packets are reassembled in a reassembly queue
on the receiver. While this is happening, you can only
have one datagram outstanding at a time. With TCP, you
get a sliding window; with UDP, you stall waiting for
the reassembly, effectively giving you a non-sliding
window (request/response, with round trip latencies per
packet, instead of two of them amortized across a 100M
file transfer).
3) When a packet is lost, the UDP retransmit code is rather
crufty. It resends the whole series of packets, and you
eat the overhead for that. TCP, on the other hand, can
do selective acknowledgement, or, if it's not supported
by both ends, it can at least acknowledge the packets
that did get through, saving you a retransmit.
4) FreeBSD's UDP fragment reassembly buffer code is well
known to pretty much suck. This is true of most UDP
fragment reassembly code in the universe, however, and
is not that specific to FreeBSD. So sending UDP packets
that get fragged because they're larger than your MTU is
not a very clever way of achieving a fixed window size
larger than the MTU (see also #2, above, for why you do
not want to used an effectively fixed window protocol
anyway).
Even if there were no packet loss at all with UDP, unless all
your data is around the size of one rsize/wsize/packet, the
combined RTT overhead for even a moderately large number of
packets in a single run is enough to trigger the amortized cost
of the additional TCP overhead being lower than the UDP overhead
from the latency. Depending on your hardware (switch latency,
half duplex, etc.), you could also be talking about a significant
combined bandwidth delay product.
Now add to all this that you have to send explicit ACKs with UDP,
while you can use piggy-back ACKs on the return payloads for TCP.
I think the idea that UDP was OK for nearly-lossless short-haul
came about from people who couldn't code a working TCP NFS client.
8-).
-- Terry
More information about the freebsd-performance
mailing list