Old SUN NFS performance papers.
Eric Anderson
anderson at centtech.com
Mon Jan 19 08:34:48 PST 2004
Steve Francis wrote:
> Benchmarking seems like the best thing to do, however I have some info
> I've collected from prior posts:
>
> These are from the thread "Slow disk write speeds over network" on
> freebsd-performance at freebsd.org, all written by Terry Lambert.
>
> you should definitely use TCP with FreeBSD NFS servers; it's
> also just generally a good idea, since UDP frags act as a fixed
> non-sliding window: NFS over UDP sucks.
>
> Also, you haven't said whether you are using aliases on your
> network cards; aliases and NFS tend to interact badly.
>
> Finally, you probably want to tweak some sysctl's, e.g.
>
> net.inet.ip.check_interface=0
> net.inet.tcp.inflight_enable=1
> net.inet.tcp.inflight_debug=0
> net.inet.tcp.msl=3000
> net.inet.tcp.inflight_min=6100
> net.isr.enable=1
>
> Given your overloading of your bus, that last one is probably
> the most important one: it enables direct dispatch.
>
> You'll also want to enable DEVICE_POLLING in your kernel
> config file (assuming you have a good ethernet card whose
> driver supports it):
>
> options DEVICE_POLLING
> options HZ=2000
>
>
> ...and yet more sysctl's for this:
>
> kern.polling.enable=1
> kern.polling.user_frac=50 # 0..100; whatever works best
>
> If you've got a really terrible Gigabit Ethernet card, then
> you may be copying all your packets over again (e.g. m_pullup()),
> and that could be eating your bus, too.
>
>
>> Huh. I thought that the conventional wisdom was that on a local network
>> with no packet loss (and therefore no re-transmission penalties), udp
>> was
>> way faster because the overhead was so much less.
>>
>> Sorry if this seems like a pretty basic question, but can you explain
>> this?
>
>
> Sure:
>
> 1) There is no such thing as no packet loss.
>
> 2) The UDP packets are reassembled in a reassembly queue
> on the receiver. While this is happening, you can only
> have one datagram outstanding at a time. With TCP, you
> get a sliding window; with UDP, you stall waiting for
> the reassembly, effectively giving you a non-sliding
> window (request/response, with round trip latencies per
> packet, instead of two of them amortized across a 100M
> file transfer).
>
> 3) When a packet is lost, the UDP retransmit code is rather
> crufty. It resends the whole series of packets, and you
> eat the overhead for that. TCP, on the other hand, can
> do selective acknowledgement, or, if it's not supported
> by both ends, it can at least acknowledge the packets
> that did get through, saving you a retransmit.
>
> 4) FreeBSD's UDP fragment reassembly buffer code is well
> known to pretty much suck. This is true of most UDP
> fragment reassembly code in the universe, however, and
> is not that specific to FreeBSD. So sending UDP packets
> that get fragged because they're larger than your MTU is
> not a very clever way of achieving a fixed window size
> larger than the MTU (see also #2, above, for why you do
> not want to used an effectively fixed window protocol
> anyway).
>
> Even if there were no packet loss at all with UDP, unless all
> your data is around the size of one rsize/wsize/packet, the
> combined RTT overhead for even a moderately large number of
> packets in a single run is enough to trigger the amortized cost
> of the additional TCP overhead being lower than the UDP overhead
> from the latency. Depending on your hardware (switch latency,
> half duplex, etc.), you could also be talking about a significant
> combined bandwidth delay product.
>
> Now add to all this that you have to send explicit ACKs with UDP,
> while you can use piggy-back ACKs on the return payloads for TCP.
>
> I think the idea that UDP was OK for nearly-lossless short-haul
> came about from people who couldn't code a working TCP NFS client.
> .
>
>
> Eric Anderson wrote:
>
>> Willem Jan Withagen wrote:
>>
>>> Hi,
>>>
>>> I had no responses to my recent question on the difference between
>>> NFS over UDP
>>> and TCP. So perhaps nobody cares??
>>>
>>> So I tried searching but have not found much yet.
>>> Does anybody know where to find the white papers SUN once wrote
>>> about tuning
>>> NFS??? They should be at sun, but where??
>>> All other suggestions to read are welcomed as well.
>>>
>>> Given my last posting I'm building two machines to do some NFS
>>> benchmark testing
>>> on.
>>> Suggestions on what people "always wanted to know (tm)" are also
>>> welcom, and
>>> I'll see if I can get them integrated.
>>> I've found the benchmarks in /usr/ports, some might do so nice work
>>> as well.
>>>
>>> If people are interested I'll keep them posted in performance@
>>>
>>
>> I'm definitely interested in what you find. I run a few heavily used
>> FreeBSD NFS servers, and therefore always looking for tweaks and nobs
>> to turn to make things better. In my experience, UDP has always been
>> faster than TCP on NFS performance. Prior to 5.2, I have also seen
>> mbuf related issues (all pretty much solvable with the right sysctl's).
>> Let me know if I can help.
>>
>> Eric
>>
>>
>
I wasn't even sure where to start or stop snipping on this mail, since
it is all good stuff - so I didn't. :) Thanks for the great info, and
good explanations.. NFS+TCP is very nice, but I do believe the UDP
transport was faster on a handful of tests (however I typically force
use of TCP when I can)..
One question - what does net.inet.ip.check_interface=0 do?
Eric
--
------------------------------------------------------------------
Eric Anderson Systems Administrator Centaur Technology
All generalizations are false, including this one.
------------------------------------------------------------------
More information about the freebsd-performance
mailing list