misc/145189: nfsd performs abysmally under load

Tue Mar 30 16:56:14 UTC 2010

On Tue, Mar 30, 2010 at 11:50 AM, Bruce Evans <brde at optusnet.com.au> wrote:
> Does it work better when limited to 1 thread (nfsd -n 1)?  In at least
> some versions of it (or maybe in nfsiod), multiple threads fight each other
> under load.

It doesn't seem to - nfsd -n 1 still ranges between 1-3 MB/s for files
> RAM on server or client (6 and 4 GB, respectively).

>> For instance, copying a 4GB file over NFSv3 from a ZFS filesystem with the
>> following flags
>> [rw,nosuid,hard,intr,nofsc,tcp,vers=3,rsize=8192,wsize=8192,sloppy,addr=X.X.X.X](Linux
>> client, the above is the server), I achieve 2 MB/s, fluctuating between 1
>> and 3. (pv reports 2.23 MB/s avg)
>>
>> Locally, on the server, I achieve 110-140 MB/s (at the end of pv, it
>> reports 123 MB/s avg).
>>
>> I'd assume network latency, but nc with no flags other than port achieves
>> 30-50 MB/s between server and client.
>>
>> Latency is also abysmal - ls on a randomly chosen homedir full of files,
>> according to time, takes:
>> real    0m15.634s
>> user    0m0.012s
>> sys     0m0.097s
>> while on the local machine:
>> real    0m0.266s
>> user    0m0.007s
>> sys     0m0.000s
>
> It probably is latency.  nfs is very latency-sensitive when there are lots
> of small files.  Transfers of large files shouldn't be affected so much.

Sure, and next on my TODO is to look into whether 9.0-CURRENT makes
certain ZFS high-latency things perform better.

>> The server in question is a 3GHz Core 2 Duo, running FreeBSD RELENG_8. The
>> kernel conf, DTRACE_POLL, is just the stock AMD64 kernel with all of the
>> DTRACE-related options turned on, as well as the option to enable polling in
>> the NIC drivers, since we were wondering if that would improve our
>> performance.
>
> Enabling polling is a good way to destroy latency.  A ping latency of
> more that about 50uS causes noticable loss of performance for nfs, but
> LAN latency is usually a few times higher than that, and polling without
> increasing the clock interrupt frequency to an excessively high value
> gives a latency of at least 20 times higher than that.  Also, -current
> with debugging options is so bloated that even localhost has a ping
> latency of about 50uS on a Core2 (up from 2uS for FreeBSD-4 on an
> AthlonXP).  Anyway try nfs on localhost to see if reducing the latency
> helps.

Actually, we noticed that throughput appeared to get marginally better while
causing occasional bursts of crushing latency, but yes, we have it on in the
kernel without using it in any actual NICs at present. :)

But yes, I'm getting 40-90+ MB/s, occasionally slowing to 20-30 MB/s,
average after copying a 6.5 GB file of 52.7 MB/s, on localhost IPv4,
with no additional mount flags. {r,w}size=8192 on localhost goes up to
80-100 MB/s, with occasional sinks to 60 (average after copying
another, separate 6.5 GB file: 77.3 MB/s).

Also:
64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.015 ms
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.049 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.012 ms
64 bytes from [actual IP]: icmp_seq=0 ttl=64 time=0.019 ms
64 bytes from [actual IP]: icmp_seq=1 ttl=64 time=0.015 ms

>> We tested this with a UFS directory as well, because we were curious if
>> this was an NFS/ZFS interaction - we still got 1-2 MB/s read speed and
>> horrible latency while achieving fast throughput and latency local to the
>> server, so we're reasonably certain it's not "just" ZFS, if there is indeed
>> any interaction there.
>
> After various tuning and bug fixing (now partly committed by others) I get
> improvements like the following on low-end systems with ffs (I don't use
> zfs):
> - very low end with 100Mbps ethernet: little change; bulk transfers always
>  went at near wire speed (about 10 MB/S)
> - low end with 1Gbps/S: bulk transfers up from 20MB/S to 45MB/S (local ffs
>  50MB/S).  buildworld over nfs of 5.2 world down from 1200 seconds to 800
>  seconds (this one is very latency-sensitive.  Takes about 750 seconds on
>  local ffs).

Is this on 9.0-CURRENT, or RELENG_8, or something else?

>> Read speed of a randomly generated 6500 MB file on UFS over NFSv3 with the
>> same flags as above: 1-3 MB/s, averaging 2.11 MB/s
>> Read speed of the same file, local to the server: consistently between
>> 40-60 MB/s, averaging 61.8 MB/s [it got faster over time - presumably UFS
>> was aggressively caching the file, or something?]
>
> You should use a file size larger than the size of main memory to prevent
> caching, especially for reads.  That is 1GB on my low-end systems.

I didn't mention the server's RAM, explicitly, but it has 6 GB of real
RAM, and the files used were 6.5-7 GB each in that case (I did use a
4GB file earlier - I've avoided doing that again here).

>> Read speed of the same file over NFS again, after the local test:
>> Amusingly, worse (768 KB/s-2.2 MB/s, with random stalls - average reported
>> 270 KB/s(!)).
>
> The random stalls are typical of the problem with the nfsd's getting
> in each other's way, and/or of related problems.  The stalls that I
> saw were very easy to see in real time using "netstat -I <interface>
> 1" -- they happened every few seconds and lasted a second or 2.  But
> they were never long enough to reduce the throughput by more than a
> factor of 3, so I always got over 19 MB/S.  The throughput was reduced
> by approximately the ratio of stalled time to non-stalled time.

I believe it. I'm seeing at least partially similar behavior here,
when I mention
the performance drops where transfer briefly pauses and then picks up again
in the localhost case, even with nfsd -n 1 and nfsiod -n 1.

- Rich