lost dotdot caching pessimizes nfs especially

Bruce Evans bde at zeta.org.au
Fri Oct 13 23:07:00 PDT 2006


On Fri, 6 Oct 2006, Bruce Evans wrote:

> This change:
>
> % Index: vfs_cache.c
> % ===================================================================
> % RCS file: /home/ncvs/src/sys/kern/vfs_cache.c,v
> % retrieving revision 1.102
> % retrieving revision 1.103
> % diff -u -2 -r1.102 -r1.103
> % --- vfs_cache.c	13 Jun 2005 05:59:59 -0000	1.102
> % +++ vfs_cache.c	17 Jun 2005 01:05:13 -0000	1.103
> % ...
>
> is responsible for about half of the performance loss since RELENG_4
> for building kernels over nfs (/usr and sys trees on nfs).  The kernel
> build uses "../../" a lot, and the above change apparently results in
> lots of network activity for things that should be cached locally.
>
> Some times for building a RELENG_4 kernel under conditions invariant
> except for the host kernel (after "make clean; sleep 2; make depend;
> make; make clean; sleep 2; make depend" to warm up caches):
>
> kernel:
> RELENG_4                 77.51 real        60.62 user         4.36 sys
> current.2004.07.01       ~78.5 (lost details)
> current.2005.01.01       ~79 (lost details)
> current.2005.06.17       82.42 real        62.50 user         4.71 sys
> current.2005.06.19       89.53 real        62.18 user         5.44 sys
> current.2005.06.17+      ~89.5 (lost details)
>               .17+ = .17 plus above change
> current.2005.06.17+*     86.08 real        62.43 user         5.13 sys
>               .17+* = .17+ with ../.. in Makefile avoided using a symlink
> 			    @ -> <path to sys not using ..>
> RELENG_6                 91.14 real        62.04 user         5.71 sys
> current                  similar to RELENG_6 (lost details)
>
> The total performance loss is about 18%.
>
> The total performance loss for a local sys tree (/usr still on nfs) is much
> smaller (about 4%):
>
> RELENG_4                 65.19 real        60.50 user         3.95 sys
> current.2005.06.17       67.49 real        62.13 user         4.27 sys
> RELENG_6                 67.83 real        61.84 user         4.71 sys
> current                  similar to RELENG_6 (lost details)
>
> The nfs performance for building of things that should be entirely
> cached locally is very dependent on network latency.  Not caching
> things very well causes lots of unnecessary network traffic for Getattr
> and Lookup.  The packets are small, so throughput is unimportant and
> latency dominates.  For building over nfs without -j, the dead time
> (real - user - sys) is almost directly proportional to the latency.
> My usual local network has fairly low latency (~100uS unloaded) and
> the ~14 seconds dead time in the above is for it.  Switching to a 1
> Gbps network with lower quality NICs gives an unloaded latency of ~160uS
> and a dead time of ~21 seconds.  Building with -j helps even for UP,
> at the cost of extra CPU, by letting some processes advance using cached
> stuff while others are waiting for the network.  Building with -j helps
> even more on FreeBSD cluster machines, more because they have a much
> higher network latency than because they are SMP.

I finished finding almost all the lost performance.  As indicated above,
It was almost all in nfs.

This change:

% Index: nfs_vnops.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/nfsclient/nfs_vnops.c,v
% retrieving revision 1.235
% retrieving revision 1.236
% diff -u -2 -r1.235 -r1.236
% --- nfs_vnops.c	6 Dec 2004 18:52:28 -0000	1.235
% +++ nfs_vnops.c	6 Dec 2004 19:18:00 -0000	1.236
% @@ -418,10 +418,11 @@
%  		if (error)
%  			return (error);
% -		np->n_mtime = vattr.va_mtime.tv_sec;
% +		np->n_mtime = vattr.va_mtime;
%  	} else {
% +		np->n_attrstamp = 0;
    		^^^^^^^^^^^^^^^^^^^^
%  		error = VOP_GETATTR(vp, &vattr, ap->a_cred, ap->a_td);
%  		if (error)
%  			return (error);
% -		if (np->n_mtime != vattr.va_mtime.tv_sec) {
% +		if (NFS_TIMESPEC_COMPARE(&np->n_mtime, &vattr.va_mtime)) {
%  			if (vp->v_type == VDIR)
%  				np->n_direofoffset = 0;

and associated changes give silly behaviour that almost doubles the
number of Access RPCs.  One of the associated changes clears n_attrstamp
on close().  Then on open(), since lookup() is called before the above
is reached, nfs_access_otw() has always just been called, and the above
forces another call.

Counting RPCs gives a good metric for the pessimizations.  Removing the
above clearing in RELENG_6 gives the following improvement:

Before:
        89.90 real        62.16 user         5.50 sys
  Lookup Read Write Create Access Fsstat Setattr Other   Total
   60010 2410  5353    442  43785   1742    5194     6  118942
After:
        86.46 real        62.22 user         5.21 sys
  Lookup Read Write Create Access Fsstat Setattr Other   Total
   59986 2410  5353    442  20935   1742    5194     6   96068

Note the RPC delta-counts barely changed except for the Access one.
About 20000 Access calls were avoided.  Just removing the clearing
is not correct but is close.

The pessimization in vfs_cache.c 1.103 is now easy to quantify.  It
triples the number of Lookup RPCs.  Removing it in addition to the
above gives a much larger improvement:

        79.24 real        61.87 user         5.04 sys
  Lookup Read Write Create Access Fsstat Setattr Other   Total
   19548 2410  5353    442  20922   1742    5194     6   55617

Note the RPC delta-counts barely changed except for the Lookup one.
About 40000 Lookup calls were avoided.  Just removing the change in
vfs_cache.c 1.103 is not close to being correct.

The last major pessimization is another silly one.  The changes to
mark atimes on exec() and mmap() cause a silly null Setattr RPC for
every exec() (more for interprters?) and every mmap().  This is
easy to fix (almost) correctly.  VOP_SETATTR() is assumed to do
nothing for requests that it doesn't understand, but nfs_setattr()
does null RPCs instead.  The following fix:

% diff -c2 ./nfsclient/nfs_vnops.c~ ./nfsclient/nfs_vnops.c
% *** ./nfsclient/nfs_vnops.c~	Sun Oct  8 23:08:57 2006
% --- ./nfsclient/nfs_vnops.c	Fri Oct 13 09:58:12 2006
% ***************
% *** 669,675 ****
% 
%   	/*
% ! 	 * Setting of flags is not supported.
%   	 */
% ! 	if (vap->va_flags != VNOVAL)
%   		return (EOPNOTSUPP);
% 
% --- 677,684 ----
% 
%   	/*
% ! 	 * Setting of flags and marking of atimes are not supported.
%   	 */
% ! 	if (vap->va_flags != VNOVAL ||
% ! 	    ((bdefix & 4) && (vap->va_vaflags & VA_MARK_ATIME)))
%   		return (EOPNOTSUPP);
%

in addition to the removals gives the following improvement with
bdefix set to 7:

        78.14 real        62.03 user         4.79 sys
  Lookup Read Write Create Access Fsstat Other   Total
   19556 2410  5353    442  19581   1738    14   49094

Bruce


More information about the freebsd-fs mailing list