High Kernel Load with nfsv4

Loïc Blot loic.blot at unix-experience.fr
Mon Dec 15 09:07:45 UTC 2014


Hi Rick,
after talking with my N+1, NFSv4 is required on our infrastructure. I tried to upgrade NFSv4+ZFS server from 9.3 to 10.1, i hope this will resolve some issues...

Regards,

Loïc Blot,
UNIX Systems, Network and Security Engineer
http://www.unix-experience.fr

10 décembre 2014 15:36 "Loïc Blot" <loic.blot at unix-experience.fr> a écrit: 
> Hi Rick,
> thanks for your suggestion.
> For my locking bug, rpc.lockd is stucked in rpcrecv state on the server. kill -9 doesn't affect the
> process, it's blocked.... (State: Ds)
> 
> for the performances
> 
> NFSv3: 60Mbps
> NFSv4: 45Mbps
> Regards,
> 
> Loïc Blot,
> UNIX Systems, Network and Security Engineer
> http://www.unix-experience.fr
> 
> 10 décembre 2014 13:56 "Rick Macklem" <rmacklem at uoguelph.ca> a écrit:
> 
>> Loic Blot wrote:
>> 
>>> Hi Rick,
>>> I'm trying NFSv3.
>>> Some jails are starting very well but now i have an issue with lockd
>>> after some minutes:
>>> 
>>> nfs server 10.10.X.8:/jails: lockd not responding
>>> nfs server 10.10.X.8:/jails lockd is alive again
>>> 
>>> I look at mbuf, but i seems there is no problem.
>> 
>> Well, if you need locks to be visible across multiple clients, then
>> I'm afraid you are stuck with using NFSv4 and the performance you get
>> from it. (There is no way to do file handle affinity for NFSv4 because
>> the read and write ops are buried in the compound RPC and not easily
>> recognized.)
>> 
>> If the locks don't need to be visible across multiple clients, I'd
>> suggest trying the "nolockd" option with nfsv3.
>> 
>>> Here is my rc.conf on server:
>>> 
>>> nfs_server_enable="YES"
>>> nfsv4_server_enable="YES"
>>> nfsuserd_enable="YES"
>>> nfsd_server_flags="-u -t -n 256"
>>> mountd_enable="YES"
>>> mountd_flags="-r"
>>> nfsuserd_flags="-usertimeout 0 -force 20"
>>> rpcbind_enable="YES"
>>> rpc_lockd_enable="YES"
>>> rpc_statd_enable="YES"
>>> 
>>> Here is the client:
>>> 
>>> nfsuserd_enable="YES"
>>> nfsuserd_flags="-usertimeout 0 -force 20"
>>> nfscbd_enable="YES"
>>> rpc_lockd_enable="YES"
>>> rpc_statd_enable="YES"
>>> 
>>> Have you got an idea ?
>>> 
>>> Regards,
>>> 
>>> Loïc Blot,
>>> UNIX Systems, Network and Security Engineer
>>> http://www.unix-experience.fr
>>> 
>>> 9 décembre 2014 04:31 "Rick Macklem" <rmacklem at uoguelph.ca> a écrit: 
>>>> Loic Blot wrote:
>>>> 
>>>>> Hi rick,
>>>>> 
>>>>> I waited 3 hours (no lag at jail launch) and now I do: sysrc
>>>>> memcached_flags="-v -m 512"
>>>>> Command was very very slow...
>>>>> 
>>>>> Here is a dd over NFS:
>>>>> 
>>>>> 601062912 bytes transferred in 21.060679 secs (28539579 bytes/sec)
>>>> 
>>>> Can you try the same read using an NFSv3 mount?
>>>> (If it runs much faster, you have probably been bitten by the ZFS
>>>> "sequential vs random" read heuristic which I've been told things
>>>> NFS is doing "random" reads without file handle affinity. File
>>>> handle affinity is very hard to do for NFSv4, so it isn't done.)
>> 
>> I was actually suggesting that you try the "dd" over nfsv3 to see how
>> the performance compared with nfsv4. If you do that, please post the
>> comparable results.
>> 
>> Someday I would like to try and get ZFS's sequential vs random read
>> heuristic modified and any info on what difference in performance that
>> might make for NFS would be useful.
>> 
>> rick
>> 
>>>> rick
>>>> 
>>>>> This is quite slow...
>>>>> 
>>>>> You can found some nfsstat below (command isn't finished yet)
>>>>> 
>>>>> nfsstat -c -w 1
>>>>> 
>>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir
>>>>> 0 0 0 0 0 0 0 0
>>>>> 4 0 0 0 0 0 16 0
>>>>> 2 0 0 0 0 0 17 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 4 0 0 0 0 4 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 4 0 0 0 0 0 3 0
>>>>> 0 0 0 0 0 0 3 0
>>>>> 37 10 0 8 0 0 14 1
>>>>> 18 16 0 4 1 2 4 0
>>>>> 78 91 0 82 6 12 30 0
>>>>> 19 18 0 2 2 4 2 0
>>>>> 0 0 0 0 2 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 1 0 0 0 0 1 0
>>>>> 4 6 0 0 6 0 3 0
>>>>> 2 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 1 0 0 0 0 0 0 0
>>>>> 0 0 0 0 1 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 6 108 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 98 54 0 86 11 0 25 0
>>>>> 36 24 0 39 25 0 10 1
>>>>> 67 8 0 63 63 0 41 0
>>>>> 34 0 0 35 34 0 0 0
>>>>> 75 0 0 75 77 0 0 0
>>>>> 34 0 0 35 35 0 0 0
>>>>> 75 0 0 74 76 0 0 0
>>>>> 33 0 0 34 33 0 0 0
>>>>> 0 0 0 0 5 0 0 0
>>>>> 0 0 0 0 0 0 6 0
>>>>> 11 0 0 0 0 0 11 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 17 0 0 0 0 1 0
>>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir
>>>>> 4 5 0 0 0 0 12 0
>>>>> 2 0 0 0 0 0 26 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 4 0 0 0 0 4 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 4 0 0 0 0 0 2 0
>>>>> 2 0 0 0 0 0 24 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 4 0 0 0 0 0 7 0
>>>>> 2 1 0 0 0 0 1 0
>>>>> 0 0 0 0 2 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 6 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 4 6 0 0 0 0 3 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 2 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 4 71 0 0 0 0 0 0
>>>>> 0 1 0 0 0 0 0 0
>>>>> 2 36 0 0 0 0 1 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 1 0 0 0 0 0 1 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 79 6 0 79 79 0 2 0
>>>>> 25 0 0 25 26 0 6 0
>>>>> 43 18 0 39 46 0 23 0
>>>>> 36 0 0 36 36 0 31 0
>>>>> 68 1 0 66 68 0 0 0
>>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir
>>>>> 36 0 0 36 36 0 0 0
>>>>> 48 0 0 48 49 0 0 0
>>>>> 20 0 0 20 20 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 3 14 0 1 0 0 11 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 0 4 0 0 0 0 4 0
>>>>> 0 0 0 0 0 0 0 0
>>>>> 4 22 0 0 0 0 16 0
>>>>> 2 0 0 0 0 0 23 0
>>>>> 
>>>>> Regards,
>>>>> 
>>>>> Loïc Blot,
>>>>> UNIX Systems, Network and Security Engineer
>>>>> http://www.unix-experience.fr
>>>>> 
>>>>> 8 décembre 2014 09:36 "Loïc Blot" <loic.blot at unix-experience.fr> a
>>>>> écrit: 
>>>>>> Hi Rick,
>>>>>> I stopped the jails this week-end and started it this morning,
>>>>>> i'll
>>>>>> give you some stats this week.
>>>>>> 
>>>>>> Here is my nfsstat -m output (with your rsize/wsize tweaks)
>> 
>> 
> nfsv4,tcp,resvport,hard,cto,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negna
>> 
>>>>>> 
>> 
>> 
> etimeo=60,rsize=32768,wsize=32768,readdirsize=32768,readahead=1,wcommitsize=773136,timeout=120,retra
>> 
>>>>>> s=2147483647
>>>>>> 
>>>>>> On server side my disks are on a raid controller which show a
>>>>>> 512b
>>>>>> volume and write performances
>>>>>> are very honest (dd if=/dev/zero of=/jails/test.dd bs=4096
>>>>>> count=100000000 => 450MBps)
>>>>>> 
>>>>>> Regards,
>>>>>> 
>>>>>> Loïc Blot,
>>>>>> UNIX Systems, Network and Security Engineer
>>>>>> http://www.unix-experience.fr
>>>>>> 
>>>>>> 5 décembre 2014 15:14 "Rick Macklem" <rmacklem at uoguelph.ca> a
>>>>>> écrit:
>>>>>> 
>>>>>>> Loic Blot wrote:
>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> i'm trying to create a virtualisation environment based on
>>>>>>>> jails.
>>>>>>>> Those jails are stored under a big ZFS pool on a FreeBSD 9.3
>>>>>>>> which
>>>>>>>> export a NFSv4 volume. This NFSv4 volume was mounted on a big
>>>>>>>> hypervisor (2 Xeon E5v3 + 128GB memory and 8 ports (but only 1
>>>>>>>> was
>>>>>>>> used at this time).
>>>>>>>> 
>>>>>>>> The problem is simple, my hypervisors runs 6 jails (used 1% cpu
>>>>>>>> and
>>>>>>>> 10GB RAM approximatively and less than 1MB bandwidth) and works
>>>>>>>> fine at start but the system slows down and after 2-3 days
>>>>>>>> become
>>>>>>>> unusable. When i look at top command i see 80-100% on system
>>>>>>>> and
>>>>>>>> commands are very very slow. Many process are tagged with
>>>>>>>> nfs_cl*.
>>>>>>> 
>>>>>>> To be honest, I would expect the slowness to be because of slow
>>>>>>> response
>>>>>>> from the NFSv4 server, but if you do:
>>>>>>> # ps axHl
>>>>>>> on a client when it is slow and post that, it would give us some
>>>>>>> more
>>>>>>> information on where the client side processes are sitting.
>>>>>>> If you also do something like:
>>>>>>> # nfsstat -c -w 1
>>>>>>> and let it run for a while, that should show you how many RPCs
>>>>>>> are
>>>>>>> being done and which ones.
>>>>>>> 
>>>>>>> # nfsstat -m
>>>>>>> will show you what your mount is actually using.
>>>>>>> The only mount option I can suggest trying is
>>>>>>> "rsize=32768,wsize=32768",
>>>>>>> since some network environments have difficulties with 64K.
>>>>>>> 
>>>>>>> There are a few things you can try on the NFSv4 server side, if
>>>>>>> it
>>>>>>> appears
>>>>>>> that the clients are generating a large RPC load.
>>>>>>> - disabling the DRC cache for TCP by setting vfs.nfsd.cachetcp=0
>>>>>>> - If the server is seeing a large write RPC load, then
>>>>>>> "sync=disabled"
>>>>>>> might help, although it does run a risk of data loss when the
>>>>>>> server
>>>>>>> crashes.
>>>>>>> Then there are a couple of other ZFS related things (I'm not a
>>>>>>> ZFS
>>>>>>> guy,
>>>>>>> but these have shown up on the mailing lists).
>>>>>>> - make sure your volumes are 4K aligned and ashift=12 (in case a
>>>>>>> drive
>>>>>>> that uses 4K sectors is pretending to be 512byte sectored)
>>>>>>> - never run over 70-80% full if write performance is an issue
>>>>>>> - use a zil on an SSD with good write performance
>>>>>>> 
>>>>>>> The only NFSv4 thing I can tell you is that it is known that
>>>>>>> ZFS's
>>>>>>> algorithm for determining sequential vs random I/O fails for
>>>>>>> NFSv4
>>>>>>> during writing and this can be a performance hit. The only
>>>>>>> workaround
>>>>>>> is to use NFSv3 mounts, since file handle affinity apparently
>>>>>>> fixes
>>>>>>> the problem and this is only done for NFSv3.
>>>>>>> 
>>>>>>> rick
>>>>>>> 
>>>>>>>> I saw that there are TSO issues with igb then i'm trying to
>>>>>>>> disable
>>>>>>>> it with sysctl but the situation wasn't solved.
>>>>>>>> 
>>>>>>>> Someone has got ideas ? I can give you more informations if you
>>>>>>>> need.
>>>>>>>> 
>>>>>>>> Thanks in advance.
>>>>>>>> Regards,
>>>>>>>> 
>>>>>>>> Loïc Blot,
>>>>>>>> UNIX Systems, Network and Security Engineer
>>>>>>>> http://www.unix-experience.fr
>>>>>>>> _______________________________________________
>>>>>>>> freebsd-fs at freebsd.org mailing list
>>>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>>>> To unsubscribe, send any mail to
>>>>>>>> "freebsd-fs-unsubscribe at freebsd.org"
>>>>>> 
>>>>>> _______________________________________________
>>>>>> freebsd-fs at freebsd.org mailing list
>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>> To unsubscribe, send any mail to
>>>>>> "freebsd-fs-unsubscribe at freebsd.org"
> 
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"


More information about the freebsd-fs mailing list