NFS Performance issue against NetApp

Marc G. Fournier scrappy at hub.org
Mon May 13 00:37:24 UTC 2013


On 2013-05-12 5:48 AM, Rick Macklem wrote:
> Marc G. Fournier wrote:
>>
>>> With
>>>
>>> vfs.nfs.noconsist=3 ... 385595ms
>>>
>>> nfsstat -z before startup, nfsstat -c after:
>>>
>>> Client Info:
>>> Rpc Counts:
>>>    Getattr Setattr Lookup Readlink Read Write Create
>>> Remove
>>>     332594 5 17238 0 224426 231137
>>> 3743 1
>>>     Rename Link Symlink Mkdir Rmdir Readdir
>>> RdirPlus Access
>>>          0 0 0 307 0 71 0 8447
>>>      Mknod Fsstat Fsinfo PathConf Commit
>>>          0 509 0 0 0
>>> Rpc Info:
>>>   TimedOut Invalid X Replies Retries Requests
>>>          0 0 0 0 818479
>>> Cache Info:
>>> Attr Hits Misses Lkup Hits Misses BioR Hits Misses BioW
>>> Hits Misses
>>>     608296 332596 526200 17245 -95425 224426 13178
>>> 231137
>>> BioRLHits Misses BioD Hits Misses DirE Hits Misses Accs
>>> Hits Misses
>>>          0 0 1050 55 502 7
>>> 543340 8448

With patch applied:

Client Info:
Rpc Counts:
   Getattr   Setattr    Lookup  Readlink      Read     Write Create    
Remove
    236577         5     17311         0    233269    231136 3743         1
    Rename      Link   Symlink     Mkdir     Rmdir   Readdir RdirPlus    
Access
         0         0         0       307         0       391 0      8488
     Mknod    Fsstat    Fsinfo  PathConf    Commit
         0       543         1         0         0
Rpc Info:
  TimedOut   Invalid X Replies   Retries  Requests
         0         0         0         0    731770
Cache Info:
Attr Hits    Misses Lkup Hits    Misses BioR Hits    Misses BioW Hits    
Misses
    714778    236578    529160     17312   -104087    233068 13178    231136
BioRLHits    Misses BioD Hits    Misses DirE Hits    Misses Accs Hits    
Misses
         0         0       788       375       542         0 546435      
8488

RPC Info Requests appear to be down but # of read/writes/getattr haven't 
changed any ...

Why does it take 34x as many reads on FreeBSD, where rsize on both 
Linux/FreeBSD are the same ... ?   The amount of data to be read  is the 
same ... shouldn't the # of reads be within the same ballpark, at least 
... ?


> Ok, so disabling the mtime based cache consistency doesn't make
> much difference. Forget about that one.
>
> I've attached another patch (which you probably shouldn't use for
> a production system either) to be tried instead of the last one.
> (This one is basically "work in progress" by Alexander Kabaev for
>   better performance during file linking. I hope he doesn't mind
>   me posting it.)
>
> rick
>
>>> ============
>>>
>>> vfs.nfs.noconsist=2 ... 392201ms
>>>
>>> Client Info:
>>> Rpc Counts:
>>>    Getattr Setattr Lookup Readlink Read Write Create
>>> Remove
>>>     332557 5 17228 0 224421 231131
>>> 3743 1
>>>     Rename Link Symlink Mkdir Rmdir Readdir
>>> RdirPlus Access
>>>          0 0 0 307 0 72 0 8430
>>>      Mknod Fsstat Fsinfo PathConf Commit
>>>          0 502 0 0 0
>>> Rpc Info:
>>>   TimedOut Invalid X Replies Retries Requests
>>>          0 0 0 0 818395
>>> Cache Info:
>>> Attr Hits Misses Lkup Hits Misses BioR Hits Misses BioW
>>> Hits Misses
>>>     607834 332557 525801 17231 -95401 224421 13178
>>> 231131
>>> BioRLHits Misses BioD Hits Misses DirE Hits Misses Accs
>>> Hits Misses
>>>          0 0 1028 56 502 0
>>> 542925 8431
>>>
>>>
>>> ============
>>> vfs.nfs.noconsist=0 ... 391622ms
>>>
>>>
>>> Client Info:
>>> Rpc Counts:
>>>    Getattr Setattr Lookup Readlink Read Write Create
>>> Remove
>>>     236122 5 17221 0 230575 230823
>>> 3743 1
>>>     Rename Link Symlink Mkdir Rmdir Readdir
>>> RdirPlus Access
>>>          0 0 0 307 0 71 0 8425
>>>      Mknod Fsstat Fsinfo PathConf Commit
>>>          0 516 0 0 0
>>> Rpc Info:
>>>   TimedOut Invalid X Replies Retries Requests
>>>          0 0 0 0 727799
>>> Cache Info:
>>> Attr Hits Misses Lkup Hits Misses BioR Hits Misses BioW
>>> Hits Misses
>>>     711860 236124 526549 17225 -101525 230490 13178
>>> 230823
>>> BioRLHits Misses BioD Hits Misses DirE Hits Misses Accs
>>> Hits Misses
>>>          0 0 1057 55 516 0
>>> 543709 8425
>>>
>>>
>>> I checked a second time with nonconsist=0, and the nfsstat -c values
>>> seem to come out pretty much the same ...
>>>
>>> I'm going to head down to the office and try again with Solaris (I'd
>>> have to re-install, since I used that system for the Solaris), and
>>> see
>>> what nfsstat -c results I get out of that ... will post a followup
>>> on
>>> this when completed ...
>>>
>>>
>>>
>>> On 2013-05-10 5:32 PM, Rick Macklem wrote:
>>>> Marc G. Fournier wrote:
>>>>> FYI … I just installed Solaris 11 onto the same hardware and ran
>>>>> the
>>>>> same test … so far, I'm seeing:
>>>>>
>>>>> Linux @ ~30s
>>>>> Solaris @ ~44s
>>>>>
>>>>> OpenBSD @ ~200s
>>>>> FreeBSD @ ~240s
>>>>>
>>>>> I've even tried FreeBSD 8.3 just to see if maybe its as 'newish'
>>>>> issue
>>>>> … same as 9.x … I could see Linux 'cutting corners', but
>>>>> Oracle/Solaris too … ?
>>>>>
>>>> The three client implementations (BSD, Linux, Solaris) were
>>>> developed
>>>> independently and, as such, will all implement somewaht different
>>>> caching algorithms (the RFCs specify what goes on the wire, but say
>>>> little w.r.t. client side caching).
>>>>
>>>> I have a attached a patch that might be useful for determining if
>>>> the client side buffer cache consistency algorithm in FreeBSD is
>>>> causing the slow startup of jboss. Do not run this patch on a
>>>> production system, since it pretty well disables all buffer cache
>>>> coherency (ie. if another client modifies a file, the patched
>>>> client
>>>> won't notice and will continue to cache stale file data).
>>>>
>>>> If the patch does speed up startup of jboss significantly, you can
>>>> use the sysctl:
>>>>    vfs.nfs.noconsist
>>>> to check for which coherency check is involved by decreasing the
>>>> value for the sysctl by 1 and then trying a startup again. (When
>>>> vfs.nfs.noconsist=0, normal cache coherency will be applied.)
>>>>
>>>> I have no idea if buffer cache coherency is a factor, but trying
>>>> the attached patch might determine if it is.
>>>>
>>>> Note that you have never posted updated "nfsstat -c" values.
>>>> (Remember that what you posted indicated 88 RPCs, which seemed
>>>>    bogus.) Finding out if FreeBSD does a lot more of certain RPCs
>>>> that Linux/Solaris might help isolate what is going on.
>>>>
>>>> rick
>>>>
>>>>> On 2013-05-03, at 04:50 , Mark Felder <feld at feld.me> wrote:
>>>>>
>>>>>> On Thu, 02 May 2013 18:43:17 -0500, Marc G. Fournier
>>>>>> <scrappy at hub.org> wrote:
>>>>>>
>>>>>>> Hadn't thought to do so with Linux, but …
>>>>>>> Linux ……. 20732ms, 20117ms, 20935ms, 20130ms, 20560ms
>>>>>>> FreeBSD .. 28996ms, 24794ms, 24702ms, 23311ms, 24153ms
>>>>>> Please make sure both platforms are using similar atime settings.
>>>>>> I
>>>>>> think most distros use ext4 with diratime by default. I'd just do
>>>>>> noatime on both platforms to be safe.
>>>>>> _______________________________________________
>>>>>> freebsd-fs at freebsd.org mailing list
>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>> To unsubscribe, send any mail to
>>>>>> "freebsd-fs-unsubscribe at freebsd.org"
>>>>> _______________________________________________
>>>>> freebsd-fs at freebsd.org mailing list
>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>> To unsubscribe, send any mail to
>>>>> "freebsd-fs-unsubscribe at freebsd.org"
>>> _______________________________________________
>>> freebsd-fs at freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>> To unsubscribe, send any mail to
>>> "freebsd-fs-unsubscribe at freebsd.org"



More information about the freebsd-fs mailing list