nfsd server cache flooded, try to increase nfsrc_floodlevel

Zack Kirsch zack.kirsch at isilon.com
Wed Jul 20 18:48:41 UTC 2011


Just wanted to add a bit of Isilon color. We've hit this limit before, but I believe it was mostly due to strange client behavior of 1) Using a new lockowner for each lock and 2) Using a new TCP connection for each 'test run'. As far as I know, we haven't hit this in the field.

We've done a few things to combat this problem:
1) We increased the floodlevel to 65536.
2) We made the floodlevel configurable via sysctl. 
3) We made significant changes to the replay cache itself. Specific gains were drastic performance improvements and freeing of cache entries from stale TCP connections.

I'd like to upstream all of this, but it will take some time, and obviously won't happen until stable9 branches.

Zack

-----Original Message-----
From: owner-freebsd-fs at freebsd.org [mailto:owner-freebsd-fs at freebsd.org] On Behalf Of Rick Macklem
Sent: Wednesday, July 20, 2011 6:30 AM
To: Clinton Adams
Cc: freebsd-fs at freebsd.org
Subject: Re: nfsd server cache flooded, try to increase nfsrc_floodlevel

Clinton Adams wrote:
> On Wed, Jul 20, 2011 at 1:09 AM, Rick Macklem <rmacklem at uoguelph.ca>
> wrote:
> > Please try the patch, which is at:
> >   http://people.freebsd.org/~rmacklem/noopen.patch
> > (This patch is against the file in -current, so patch may not like 
> > it, but
> >  it should be easy to do by hand, if patch fails.)
> >
> > Again, good luck with it and please let me know how it goes, rick
> >
> 
> Thanks for your help with this, trying the patches now. Tests with one 
> client look good so far, values for OpenOwner and CacheSize are more 
> in line, we'll test with more clients later today. We were hitting the 
> nfsrc_floodlevel with just three clients before, all using nfs4 
> mounted home and other directories. Clients are running Ubuntu 10.04.2 
> LTS. Usage has been general desktop usage, nothing unusual that we 
> could see.
> 
> Relevant snippet of /proc/mounts on client (rsize,wsize are being 
> automatically negotiated, not specified in the automount options):
> pez.votesmart.org:/public /export/public nfs4
> rw,relatime,vers=4,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,t
> imeo=600,retrans=2,sec=krb5,clientaddr=192.168.255.112,minorversion=0,
> addr=192.168.255.25
> 0 0
> pez.votesmart.org:/home/clinton /home/clinton nfs4
> rw,relatime,vers=4,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,t
> imeo=600,retrans=2,sec=krb5,clientaddr=192.168.255.112,minorversion=0,
> addr=192.168.255.25
> 0 0
> 
> nfsstat -e -s, with patches, after some stress testing:
> Server Info:
> Getattr Setattr Lookup Readlink Read Write Create Remove
> 95334 1 28004 50 297125 2 0 0
> Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access
> 0 0 0 0 0 1242 0 1444
> Mknod Fsstat Fsinfo PathConf Commit LookupP SetClId SetClIdCf
> 0 0 0 0 2 0 4 4
> Open OpenAttr OpenDwnGr OpenCfrm DelePurge DeleRet GetFH Lock
> 176735 0 0 21175 0 0 49171 0
> LockT LockU Close Verify NVerify PutFH PutPubFH PutRootFH
> 0 0 21184 0 0 549853 0 17
> Renew RestoreFH SaveFH Secinfo RelLckOwn V4Create
> 0 21186 176735 0 0 0
> Server:
> Retfailed Faults Clients
> 0 0 1
> OpenOwner Opens LockOwner Locks Delegs
> 291 2 0 0 0
> Server Cache Stats:
> Inprog Idem Non-idem Misses CacheSize TCPPeak
> 0 0 0 549969 291 2827
> 
Yes, these stats look reasonable.
(and sorry if the mail system I use munged the whitespace)

> nfsstat -e -s, prior to patches, general usage:
> 
> Server Info:
> Getattr Setattr Lookup Readlink Read Write Create Remove
> 2813477 62661 382636 1419 837492 2115422 0 33976 Rename Link Symlink 
> Mkdir Rmdir Readdir RdirPlus Access
> 31164 1310 0 0 0 15678 10 307236
> Mknod Fsstat Fsinfo PathConf Commit LookupP SetClId SetClIdCf
> 0 0 2 1 144550 0 43 43
> Open OpenAttr OpenDwnGr OpenCfrm DelePurge DeleRet GetFH Lock
> 1462595 0 595 11267 0 0 550761 280674
> LockT LockU Close Verify NVerify PutFH PutPubFH PutRootFH
> 155 212299 286615 0 0 6651006 0 1234
> Renew RestoreFH SaveFH Secinfo RelLckOwn V4Create
> 256784 320761 1495805 0 0 738
> Server:
> Retfailed Faults Clients
> 0 0 3
> OpenOwner Opens LockOwner Locks Delegs
> 6 178 8012 2 0
> Server Cache Stats:
> Inprog Idem Non-idem Misses CacheSize TCPPeak
> 0 0 96 6876610 8084 13429
> 
Hmm. LockOwners have the same property as OpenOwners in that the server is required to hold onto the last reply in the cache until the Open/Lock Owner is released. Unfortunately, a server can't release a LockOwner until either the client issues a ReleaseLockOwner operation to tell the server that it will no longer use the LockOwner or the open is closed.

These stats suggest that the client tried to do byte range locking over 8000 times with different LockOwners (I don't know how the Linux client decided to use a different LockOwner?), for file(s) that were still open. (When I test using the Fedora15 client, I do see ReleaseLockOwner operations, but usually just before a close. I don't know how recently that was added to the Linux client. ReleaseLockOwner was added just before the RFC was published to try and deal with a situation where the client uses a lot of LockOwners that the server must hold onto until the file is closed.

If this is legitimate, all that can be done is increase NFSRVCACHE_FLOODLEVEL and hope that you can find a value large enough that the clients don't bump into it without exhausting mbufs. (I'd increase "kern.ipc.nmbclusters" to something larger than what you set NFSRVCACHE_FLOODLEEVEL to.)

However, I suspect the 8084 LockOwners is a result of some other problem. Fingers and toes crossed that it was a side effect of the cache SMP bugs fixed by cache.patch. (noopen.patch won't help for this case, because it appears to be lockowners and not openowners that are holding the cached entries, but it won't do any harm, either.)

If you see very large LockOwner counts again, with the patched kernel, all I can suggest is doing a packet capture and emailing it to me. "tcpdump -s 0 -w xxx" run for a short enough time that "xxx" isn't huge when run on the server might catch some issue (like the client retrying a lock over and over and over again). A packet capture might also show if the Ubuntu client is doing ReleaseLockOwner operations. (Btw, you can look at the trace using wireshark, which knows about NFSv4.)

In summary, It'll be interesting to see how this goes, rick
ps: Sorry about the long winded reply, but this is nfsv4 after all:-)

_______________________________________________
freebsd-fs at freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"


More information about the freebsd-fs mailing list