misc/145189: nfsd performs abysmally under load

Tue Mar 30 17:00:20 UTC 2010

The following reply was made to PR misc/145189; it has been noted by GNATS.

From: Rich <rercola at acm.jhu.edu>
To: Bruce Evans <brde at optusnet.com.au>
Cc: freebsd-gnats-submit at freebsd.org, freebsd-bugs at freebsd.org
Subject: Re: misc/145189: nfsd performs abysmally under load
Date: Tue, 30 Mar 2010 12:29:37 -0400

 On Tue, Mar 30, 2010 at 11:50 AM, Bruce Evans <brde at optusnet.com.au> wrote:
 > Does it work better when limited to 1 thread (nfsd -n 1)? =A0In at least
 > some versions of it (or maybe in nfsiod), multiple threads fight each oth=
 er
 > under load.

 It doesn't seem to - nfsd -n 1 still ranges between 1-3 MB/s for files
 > RAM on server or client (6 and 4 GB, respectively).

 >> For instance, copying a 4GB file over NFSv3 from a ZFS filesystem with t=
 he
 >> following flags
 >> [rw,nosuid,hard,intr,nofsc,tcp,vers=3D3,rsize=3D8192,wsize=3D8192,sloppy=
 ,addr=3DX.X.X.X](Linux
 >> client, the above is the server), I achieve 2 MB/s, fluctuating between =
 1
 >> and 3. (pv reports 2.23 MB/s avg)
 >>
 >> Locally, on the server, I achieve 110-140 MB/s (at the end of pv, it
 >> reports 123 MB/s avg).
 >>
 >> I'd assume network latency, but nc with no flags other than port achieve=
 s
 >> 30-50 MB/s between server and client.
 >>
 >> Latency is also abysmal - ls on a randomly chosen homedir full of files,
 >> according to time, takes:
 >> real =A0 =A00m15.634s
 >> user =A0 =A00m0.012s
 >> sys =A0 =A0 0m0.097s
 >> while on the local machine:
 >> real =A0 =A00m0.266s
 >> user =A0 =A00m0.007s
 >> sys =A0 =A0 0m0.000s
 >
 > It probably is latency. =A0nfs is very latency-sensitive when there are l=
 ots
 > of small files. =A0Transfers of large files shouldn't be affected so much=
 .

 Sure, and next on my TODO is to look into whether 9.0-CURRENT makes
 certain ZFS high-latency things perform better.

 >> The server in question is a 3GHz Core 2 Duo, running FreeBSD RELENG_8. T=
 he
 >> kernel conf, DTRACE_POLL, is just the stock AMD64 kernel with all of the
 >> DTRACE-related options turned on, as well as the option to enable pollin=
 g in
 >> the NIC drivers, since we were wondering if that would improve our
 >> performance.
 >
 > Enabling polling is a good way to destroy latency. =A0A ping latency of
 > more that about 50uS causes noticable loss of performance for nfs, but
 > LAN latency is usually a few times higher than that, and polling without
 > increasing the clock interrupt frequency to an excessively high value
 > gives a latency of at least 20 times higher than that. =A0Also, -current
 > with debugging options is so bloated that even localhost has a ping
 > latency of about 50uS on a Core2 (up from 2uS for FreeBSD-4 on an
 > AthlonXP). =A0Anyway try nfs on localhost to see if reducing the latency
 > helps.

 Actually, we noticed that throughput appeared to get marginally better whil=
 e
 causing occasional bursts of crushing latency, but yes, we have it on in th=
 e
 kernel without using it in any actual NICs at present. :)

 But yes, I'm getting 40-90+ MB/s, occasionally slowing to 20-30 MB/s,
 average after copying a 6.5 GB file of 52.7 MB/s, on localhost IPv4,
 with no additional mount flags. {r,w}size=3D8192 on localhost goes up to
 80-100 MB/s, with occasional sinks to 60 (average after copying
 another, separate 6.5 GB file: 77.3 MB/s).

 Also:
 64 bytes from 127.0.0.1: icmp_seq=3D0 ttl=3D64 time=3D0.015 ms
 64 bytes from 127.0.0.1: icmp_seq=3D1 ttl=3D64 time=3D0.049 ms
 64 bytes from 127.0.0.1: icmp_seq=3D2 ttl=3D64 time=3D0.012 ms
 64 bytes from [actual IP]: icmp_seq=3D0 ttl=3D64 time=3D0.019 ms
 64 bytes from [actual IP]: icmp_seq=3D1 ttl=3D64 time=3D0.015 ms

 >> We tested this with a UFS directory as well, because we were curious if
 >> this was an NFS/ZFS interaction - we still got 1-2 MB/s read speed and
 >> horrible latency while achieving fast throughput and latency local to th=
 e
 >> server, so we're reasonably certain it's not "just" ZFS, if there is ind=
 eed
 >> any interaction there.
 >
 > After various tuning and bug fixing (now partly committed by others) I ge=
 t
 > improvements like the following on low-end systems with ffs (I don't use
 > zfs):
 > - very low end with 100Mbps ethernet: little change; bulk transfers alway=
 s
 > =A0went at near wire speed (about 10 MB/S)
 > - low end with 1Gbps/S: bulk transfers up from 20MB/S to 45MB/S (local ff=
 s
 > =A050MB/S). =A0buildworld over nfs of 5.2 world down from 1200 seconds to=
  800
 > =A0seconds (this one is very latency-sensitive. =A0Takes about 750 second=
 s on
 > =A0local ffs).

 Is this on 9.0-CURRENT, or RELENG_8, or something else?

 >> Read speed of a randomly generated 6500 MB file on UFS over NFSv3 with t=
 he
 >> same flags as above: 1-3 MB/s, averaging 2.11 MB/s
 >> Read speed of the same file, local to the server: consistently between
 >> 40-60 MB/s, averaging 61.8 MB/s [it got faster over time - presumably UF=
 S
 >> was aggressively caching the file, or something?]
 >
 > You should use a file size larger than the size of main memory to prevent
 > caching, especially for reads. =A0That is 1GB on my low-end systems.

 I didn't mention the server's RAM, explicitly, but it has 6 GB of real
 RAM, and the files used were 6.5-7 GB each in that case (I did use a
 4GB file earlier - I've avoided doing that again here).

 >> Read speed of the same file over NFS again, after the local test:
 >> Amusingly, worse (768 KB/s-2.2 MB/s, with random stalls - average report=
 ed
 >> 270 KB/s(!)).
 >
 > The random stalls are typical of the problem with the nfsd's getting
 > in each other's way, and/or of related problems. =A0The stalls that I
 > saw were very easy to see in real time using "netstat -I <interface>
 > 1" -- they happened every few seconds and lasted a second or 2. =A0But
 > they were never long enough to reduce the throughput by more than a
 > factor of 3, so I always got over 19 MB/S. =A0The throughput was reduced
 > by approximately the ratio of stalled time to non-stalled time.

 I believe it. I'm seeing at least partially similar behavior here,
 when I mention
 the performance drops where transfer briefly pauses and then picks up again
 in the localhost case, even with nfsd -n 1 and nfsiod -n 1.

 - Rich