misc/145189: nfsd performs abysmally under load

Tue Mar 30 20:44:07 UTC 2010

On Tue, Mar 30, 2010 at 1:11 PM, Bruce Evans <brde at optusnet.com.au> wrote:
> On Tue, 30 Mar 2010, Rich wrote:
>
>> On Tue, Mar 30, 2010 at 11:50 AM, Bruce Evans <brde at optusnet.com.au>
>> wrote:
>
>>>> For instance, copying a 4GB file over NFSv3 from a ZFS filesystem with
>>>> the
>>>> following flags
>>>>
>>>> [rw,nosuid,hard,intr,nofsc,tcp,vers=3,rsize=8192,wsize=8192,sloppy,addr=X.X.X.X](Linux
>>>> client, the above is the server), I achieve 2 MB/s, fluctuating between
>>>> 1
>>>> and 3. (pv reports 2.23 MB/s avg)
>
> I also tried various nfs r/w sizes and tcp/udp.  The best sizes are
> probably the fs block size or twice that (normally 16K for ffs).  Old
> versions of FreeBSD had even more bugs in this area and gave surprising
> performance differences depending on the nfs r/w sizes or application
> i/o sizes.  In some cases smaller sizes worked best, apparently because
> they avoided the stalls.
>
>>>> ...
>>>
>>> Enabling polling is a good way to destroy latency.  A ping latency of
>>> ...
>
>> Actually, we noticed that throughput appeared to get marginally better
>> while
>> causing occasional bursts of crushing latency, but yes, we have it on in
>> the
>> kernel without using it in any actual NICs at present. :)
>>
>> But yes, I'm getting 40-90+ MB/s, occasionally slowing to 20-30 MB/s,
>> average after copying a 6.5 GB file of 52.7 MB/s, on localhost IPv4,
>> with no additional mount flags. {r,w}size=8192 on localhost goes up to
>> 80-100 MB/s, with occasional sinks to 60 (average after copying
>> another, separate 6.5 GB file: 77.3 MB/s).
>
> I thought you said you often got 1-3MB/S.
>
>> Also:
>> 64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.015 ms
>> 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.049 ms
>> 64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.012 ms
>
> Fairly normal slowness for -current.
>
>> 64 bytes from [actual IP]: icmp_seq=0 ttl=64 time=0.019 ms
>> 64 bytes from [actual IP]: icmp_seq=1 ttl=64 time=0.015 ms
>
> Are these with external hardware NICs?  Then 15 uS is excellent.  Better
> than I've ever seen.  Very good hardware might be able to do this, but
> I suspect it is for the local machine.  BTW, I don't like the times
> been reported in ms and sub-uS times not being supported.  I sometimes
> run Linux or cygwin ping and don't like it not supporting sub-mS times,
> so that it always reports 0 for my average latency of 100 uS.
>
>>> After various tuning and bug fixing (now partly committed by others) I
>>> get
>>> improvements like the following on low-end systems with ffs (I don't use
>>> zfs):
>>> - very low end with 100Mbps ethernet: little change; bulk transfers
>>> always
>>>  went at near wire speed (about 10 MB/S)
>>> - low end with 1Gbps/S: bulk transfers up from 20MB/S to 45MB/S (local
>>> ffs
>>>  50MB/S).  buildworld over nfs of 5.2 world down from 1200 seconds to 800
>>>  seconds (this one is very latency-sensitive.  Takes about 750 seconds on
>>>  local ffs).
>>
>> Is this on 9.0-CURRENT, or RELENG_8, or something else?
>
> Mostly with 7-CURRENT or 8-CURRENT a couple of years ago.  Sometimes with
> a ~5.2-SERVER.  nfs didn't vary much with the server, except there were
> surprising differences due to latency that I never tracked down.
>
> I forgot to mention another thing you can try easily:
>
> - negative name caching.  Improves latency.  I used this to reduce makeworld
>  times significantly, and it is now standard in -current but not
>  enabled by default.

    Have you also tried tuning via sysctl (vfs.nfs* ?)
Thanks,
-Garrett