misc/145189: nfsd performs abysmally under load

Tue Mar 30 20:11:45 UTC 2010

On Tue, 30 Mar 2010, Rich wrote:

> On Tue, Mar 30, 2010 at 11:50 AM, Bruce Evans <brde at optusnet.com.au> wrote:

>>> For instance, copying a 4GB file over NFSv3 from a ZFS filesystem with the
>>> following flags
>>> [rw,nosuid,hard,intr,nofsc,tcp,vers=3,rsize=8192,wsize=8192,sloppy,addr=X.X.X.X](Linux
>>> client, the above is the server), I achieve 2 MB/s, fluctuating between 1
>>> and 3. (pv reports 2.23 MB/s avg)

I also tried various nfs r/w sizes and tcp/udp.  The best sizes are
probably the fs block size or twice that (normally 16K for ffs).  Old
versions of FreeBSD had even more bugs in this area and gave surprising
performance differences depending on the nfs r/w sizes or application
i/o sizes.  In some cases smaller sizes worked best, apparently because
they avoided the stalls.

>>> ...
>> Enabling polling is a good way to destroy latency.  A ping latency of
>> ...

> Actually, we noticed that throughput appeared to get marginally better while
> causing occasional bursts of crushing latency, but yes, we have it on in the
> kernel without using it in any actual NICs at present. :)
>
> But yes, I'm getting 40-90+ MB/s, occasionally slowing to 20-30 MB/s,
> average after copying a 6.5 GB file of 52.7 MB/s, on localhost IPv4,
> with no additional mount flags. {r,w}size=8192 on localhost goes up to
> 80-100 MB/s, with occasional sinks to 60 (average after copying
> another, separate 6.5 GB file: 77.3 MB/s).

I thought you said you often got 1-3MB/S.

> Also:
> 64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.015 ms
> 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.049 ms
> 64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.012 ms

Fairly normal slowness for -current.

> 64 bytes from [actual IP]: icmp_seq=0 ttl=64 time=0.019 ms
> 64 bytes from [actual IP]: icmp_seq=1 ttl=64 time=0.015 ms

Are these with external hardware NICs?  Then 15 uS is excellent.  Better
than I've ever seen.  Very good hardware might be able to do this, but
I suspect it is for the local machine.  BTW, I don't like the times
been reported in ms and sub-uS times not being supported.  I sometimes
run Linux or cygwin ping and don't like it not supporting sub-mS times,
so that it always reports 0 for my average latency of 100 uS.

>> After various tuning and bug fixing (now partly committed by others) I get
>> improvements like the following on low-end systems with ffs (I don't use
>> zfs):
>> - very low end with 100Mbps ethernet: little change; bulk transfers always
>>  went at near wire speed (about 10 MB/S)
>> - low end with 1Gbps/S: bulk transfers up from 20MB/S to 45MB/S (local ffs
>>  50MB/S).  buildworld over nfs of 5.2 world down from 1200 seconds to 800
>>  seconds (this one is very latency-sensitive.  Takes about 750 seconds on
>>  local ffs).
>
> Is this on 9.0-CURRENT, or RELENG_8, or something else?

Mostly with 7-CURRENT or 8-CURRENT a couple of years ago.  Sometimes with
a ~5.2-SERVER.  nfs didn't vary much with the server, except there were
surprising differences due to latency that I never tracked down.

I forgot to mention another thing you can try easily:

- negative name caching.  Improves latency.  I used this to reduce makeworld
   times significantly, and it is now standard in -current but not
   enabled by default.

Bruce