FreeBSD 9.1 NFSv4 client attribute cache not caching ?

Mon Apr 15 01:32:10 UTC 2013

Paul van der Zwan wrote:
> On 14 Apr 2013, at 5:00 , Rick Macklem <rmacklem at uoguelph.ca> wrote:
> 
> 
> Thanks for taking the effort to send such an extensive reply.
> 
> > Paul van der Zwan wrote:
> >> On 12 Apr 2013, at 16:28 , Paul van der Zwan <paulz at vanderzwan.org>
> >> wrote:
> >>
> >>>
> >>> I am running a few VirtualBox VMs with 9.1 on my OpenIndiana
> >>> server
> >>> and I noticed that make buildworld seem to take much longer
> >>> when the clients mount /usr/src and /usr/obj over NFS V4 than when
> >>> they use V3.
> >>> Unfortunately I have to use V4 as a buildworld on V3 hangs the
> >>> server completely...
> >>> I noticed the number of PUTFH/GETATTR/GETFH calls in in the order
> >>> of
> >>> a few thousand per second
> >>> and if I snoop the traffic I see the same filenames appear over
> >>> and
> >>> over again.
> >>> It looks like the client is not caching anything at all and using
> >>> a
> >>> server request everytime.
> >>> I use the default mount options:
> >>> 192.168.178.24:/data/ports on /usr/ports (nfs, nfsv4acls)
> >>> 192.168.178.24:/data/src on /usr/src (nfs, nfsv4acls)
> >>> 192.168.178.24:/data/obj on /usr/obj (nfs, nfsv4acls)
> >>>
> >>>
> >>
> >> I had a look with dtrace
> >> $ sudo dtrace -n '::getattr:start { @[stack()]=count();}'
> >> and it seems the vast majority of the calls to getattr are from
> >> open()
> >> and close() system calls.:
> >> kernel`newnfs_request+0x631
> >> kernel`nfscl_request+0x75
> >> kernel`nfsrpc_getattr+0xbe
> >> kernel`nfs_getattr+0x280
> >> kernel`VOP_GETATTR_APV+0x74
> >> kernel`nfs_lookup+0x3cc
> >> kernel`VOP_LOOKUP_APV+0x74
> >> kernel`lookup+0x69e
> >> kernel`namei+0x6df
> >> kernel`kern_execve+0x47a
> >> kernel`sys_execve+0x43
> >> kernel`amd64_syscall+0x3bf
> >> kernel`0xffffffff80784947
> >> 26
> >>
> >> kernel`newnfs_request+0x631
> >> kernel`nfscl_request+0x75
> >> kernel`nfsrpc_getattr+0xbe
> >> kernel`nfs_close+0x3e9
> >> kernel`VOP_CLOSE_APV+0x74
> >> kernel`kern_execve+0x15c5
> >> kernel`sys_execve+0x43
> >> kernel`amd64_syscall+0x3bf
> >> kernel`0xffffffff80784947
> >> 26
> >>
> >> kernel`newnfs_request+0x631
> >> kernel`nfscl_request+0x75
> >> kernel`nfsrpc_getattr+0xbe
> >> kernel`nfs_getattr+0x280
> >> kernel`VOP_GETATTR_APV+0x74
> >> kernel`nfs_lookup+0x3cc
> >> kernel`VOP_LOOKUP_APV+0x74
> >> kernel`lookup+0x69e
> >> kernel`namei+0x6df
> >> kernel`vn_open_cred+0x330
> >> kernel`vn_open+0x1c
> >> kernel`kern_openat+0x207
> >> kernel`kern_open+0x19
> >> kernel`sys_open+0x18
> >> kernel`amd64_syscall+0x3bf
> >> kernel`0xffffffff80784947
> >> 2512
> >>
> >> kernel`newnfs_request+0x631
> >> kernel`nfscl_request+0x75
> >> kernel`nfsrpc_getattr+0xbe
> >> kernel`nfs_close+0x3e9
> >> kernel`VOP_CLOSE_APV+0x74
> >> kernel`vn_close+0xee
> >> kernel`vn_closefile+0xff
> >> kernel`_fdrop+0x3a
> >> kernel`closef+0x332
> >> kernel`kern_close+0x183
> >> kernel`sys_close+0xb
> >> kernel`amd64_syscall+0x3bf
> >> kernel`0xffffffff80784947
> >> 2530
> >>
> >> I had a look at the source of nfs_close and could not find a call
> >> to
> >> nfsrpc_getattr, and I am wondering why close would be calling
> >> getattr
> >> anyway.
> >> If the file is closed what do we care about it's attributes....
> >>
> > Here are some random statements w.r.t. NFSv3 vs NFSv4 that might
> > help
> > with an understanding of what is going on. I do address the specific
> > case of nfs_close() towards the end. (It is kinda long winded, but I
> > threw out eveything I could think of..)
> >
> > NFSv3 doesn't have any open/close RPC, but NFSv4 does have Open and
> > Close operations.
> >
> > In NFSv3, each RPC is defined and usually includes attributes for
> > files
> > before and after the operation (implicit getattrs not counted in the
> > RPC
> > counts reported by nfsstat).
> >
> > For NFSv4, every RPC is a compound built up of a list of Operations
> > like
> > Getattr. Since the NFSv4 server doesn't know what the compound is
> > doing,
> > nfsstat reports the counts of Operations for the NFSv4 server, so
> > the counts
> > will be much higher than with NFSv3, but do not reflect the number
> > of RPCs being done.
> > To get NFSv4 nfsstat output that can be compared to NFSv3, you need
> > to
> > do the command on the client(s) and it still is only roughly the
> > same.
> > (I just realized this should be documented in man nfsstat.)
> >
> I ran nfsstat -s -v 4 on the server and saw the number of requests
> being done.
> They were in the order of a few thousand per second for a single
> FreeBSD 9.1 client
> doing a make build world.
> 
Yes, but as I noted above, for NFSv4, these are counts of operations,
not RPCs. Each RPC in NFSv4 consists of several operations. For example,
for read it is something like:
- PutFH, Read, Getattr

As such, you need to do "nfsstat -e -c" on the client in order to
see how many RPCs are happening.

> > For the FreeBSD NFSv4 client, the compounds include Getattr
> > operations
> > similar to what NFSv3 does. It doesn't do a Getattr on the directory
> > for Lookup, because that would have made the compound much more
> > complex.
> > I don't think this will have a significant performance impact, but
> > will
> > result in some additional Getattr RPCs.
> >
> I ran snoop on port 2049 on the server and I saw a large number of
> lookups.
> A lot of them seem to be for directories which are part of the
> filenames of
> the compiler and include files which on the nfs mounted /usr/obj.
> The same names keep reappering so it looks like there is no caching
> being done on
> the client.
> 
Well, the name caching code is virtually identical to what is used
for NFSv3 and I have compared RPC counts (using client stats) in
the past (some while ago), to see if they are comparable.

A name cache entry (like everything in NFS caching) is only valid
for some amount of time (there are mount options for adjusting the
cache timeouts).

Now, I`m not saying it isn`t broken. I`ll take a look when I get
home and it is also rather hard to tell when it is broken.
Since NFS has no cache coherency protocol any amount of caching
can break correctness when files are being modified by another
client. The longer you cache, the more likely you are to see
a breakage.

> > I suspect the slowness is caused by the extra overhead of doing the
> > Open/Close operations against the server. The only way to avoid
> > doing
> > these against the server for NFSv4 is to enable delegations in both
> > client and server. How to do this is documented in "man nfsv4".
> > Basically
> > starting up the nfscbd in the client and setting:
> > vfs.nfsd.issue_delegations=1
> > in the server.
> >
> > Specifically for nfs_close(), the attributes (modify time)
> > is used for what is called "close to open consistency". This can be
> > disabled by the "nocto" mount option, if you don't need it for your
> > build environment. (You only need it if one client is writing a file
> > and then another client is reading the same file.)
> >
> I tried the nocto option in /etc/fstab but it does not show when mount
> shows
> the mounted filesystems so I am not sure if it is being used.
Head (and I think stable9) is patched so that ``nfsstat -m`` shows
all the options actually being used. For 9.1, you just have to trust
that it has been set.

> On the server netstat shows an active connection to port 7745 on the
> client
> but snoop shows no data flowing on that session.
> 
That connection is the callback path and is only used when delegations
are in use. You can check to see if the server is issuing delegations
via ``nfsstat -e -s`` on the server and looking at the delegation count,
to see if it is greater than 0.
Remember that you must enable delegations on the server by setting the
sysctl:
vfs.nfsd.issue_delegations=1

> > Both the attribute caching and close to open consistency algorithms
> > in the client are essentially the same for NFSv3 vs NFSv4.
> >
> > The NFSv4 Close operation(s) are actually done when the v_usecount
> > for
> > the vnode goes to 0, since mmap'd files can do I/O on pages after
> > the close syscall. As such, they are only loosely related to the
> > close
> > syscall. They are actually closing Windows style Openlock(s).
> >
> I had a look at the code of the NFS v4 client of Illumos ( which is
> basically what
> my server is running ) and as far as I understand it they only do the
> gettatr only when
> the close was for a file that was opened for write and when there was
> actually something
> written to the file.
> The FreeBSD code seems to do the getattr for all close() calls.
> For files that were never written, like executables or source files
> that seems
> to cause quite a lot of overhead.
> 
Well, what would happen for the Illumos client if another client had
just written to the file and closed it just before Illumos opens it
for reading. For cto, the client needs to see an up to date modify
time for the close to open consistency check to work, including opens
for reading.

As hinted at above, there is no correct answer to these questions. It
is all about correctness vs caching for better performance. (That`s why
jhb added the nocto option to turn it off if you don`t need this to
work correctly.)

> > You mention that you see the same file over and over in a packet
> > trace.
> > You don't give specifics, but I'd suggest that you look at both
> > NFSv3
> > and NFSv4 for this (and file names are in lookups, not getattrs).
> >
> > I'd suggest you try enabling delegations in both client and server,
> > plus
> > trying the "nocto" mount option and see if that helps.
> >
> Tried it but it does not seem to make any noticable difference.
> 
> I tried a make buildworld buildkernel with /usr/obj a local FS in the
> Vbox VM
> that completed in about 2 hours. With /usr/obj on an NFS v4 filesystem
> it takes
> about a day. A twelve fold increase is elapsed time makes using NFSv4
> unusable
> for this use case.
Source builds on NFS mounts are notoriously slow. A big part of this is
the synchronous writes that get done because there is only one dirty
byte range for a block and the loader loves to write small non-contiguous
areas of its output file.

I have a patch that extends this to a list of dirty byte ranges, but it
has not been committed to head. I should try and get back to it this
summer.

> Too bad the server hangs when I use nfsv3 mount for /usr/obj.
Try this mount command:
mount -t nfs -o nfsv3,nolockd ...
(I do builds of the src tree NFS mounted, so the only reason I can
 think that it would hang would be a rpc.lockd issue.)
If this works, I suspect it will still be slow, but it would be nice to
find out how much slower NFSv4 is for your case.

rick

> Having a shared /usr/obj makes it possible to run a make buildworld on
> a single VM
> and just run make installworld on the others.
> 
> Paul
> 
> > rick
> >
> >>
> >> Paul
> >>
> >> _______________________________________________
> >> freebsd-fs at freebsd.org mailing list
> >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >> To unsubscribe, send any mail to
> >> "freebsd-fs-unsubscribe at freebsd.org"
> >