lost dotdot caching pessimizes nfs especially
Bruce Evans
bde at zeta.org.au
Fri Oct 13 23:07:00 PDT 2006
On Fri, 6 Oct 2006, Bruce Evans wrote:
> This change:
>
> % Index: vfs_cache.c
> % ===================================================================
> % RCS file: /home/ncvs/src/sys/kern/vfs_cache.c,v
> % retrieving revision 1.102
> % retrieving revision 1.103
> % diff -u -2 -r1.102 -r1.103
> % --- vfs_cache.c 13 Jun 2005 05:59:59 -0000 1.102
> % +++ vfs_cache.c 17 Jun 2005 01:05:13 -0000 1.103
> % ...
>
> is responsible for about half of the performance loss since RELENG_4
> for building kernels over nfs (/usr and sys trees on nfs). The kernel
> build uses "../../" a lot, and the above change apparently results in
> lots of network activity for things that should be cached locally.
>
> Some times for building a RELENG_4 kernel under conditions invariant
> except for the host kernel (after "make clean; sleep 2; make depend;
> make; make clean; sleep 2; make depend" to warm up caches):
>
> kernel:
> RELENG_4 77.51 real 60.62 user 4.36 sys
> current.2004.07.01 ~78.5 (lost details)
> current.2005.01.01 ~79 (lost details)
> current.2005.06.17 82.42 real 62.50 user 4.71 sys
> current.2005.06.19 89.53 real 62.18 user 5.44 sys
> current.2005.06.17+ ~89.5 (lost details)
> .17+ = .17 plus above change
> current.2005.06.17+* 86.08 real 62.43 user 5.13 sys
> .17+* = .17+ with ../.. in Makefile avoided using a symlink
> @ -> <path to sys not using ..>
> RELENG_6 91.14 real 62.04 user 5.71 sys
> current similar to RELENG_6 (lost details)
>
> The total performance loss is about 18%.
>
> The total performance loss for a local sys tree (/usr still on nfs) is much
> smaller (about 4%):
>
> RELENG_4 65.19 real 60.50 user 3.95 sys
> current.2005.06.17 67.49 real 62.13 user 4.27 sys
> RELENG_6 67.83 real 61.84 user 4.71 sys
> current similar to RELENG_6 (lost details)
>
> The nfs performance for building of things that should be entirely
> cached locally is very dependent on network latency. Not caching
> things very well causes lots of unnecessary network traffic for Getattr
> and Lookup. The packets are small, so throughput is unimportant and
> latency dominates. For building over nfs without -j, the dead time
> (real - user - sys) is almost directly proportional to the latency.
> My usual local network has fairly low latency (~100uS unloaded) and
> the ~14 seconds dead time in the above is for it. Switching to a 1
> Gbps network with lower quality NICs gives an unloaded latency of ~160uS
> and a dead time of ~21 seconds. Building with -j helps even for UP,
> at the cost of extra CPU, by letting some processes advance using cached
> stuff while others are waiting for the network. Building with -j helps
> even more on FreeBSD cluster machines, more because they have a much
> higher network latency than because they are SMP.
I finished finding almost all the lost performance. As indicated above,
It was almost all in nfs.
This change:
% Index: nfs_vnops.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/nfsclient/nfs_vnops.c,v
% retrieving revision 1.235
% retrieving revision 1.236
% diff -u -2 -r1.235 -r1.236
% --- nfs_vnops.c 6 Dec 2004 18:52:28 -0000 1.235
% +++ nfs_vnops.c 6 Dec 2004 19:18:00 -0000 1.236
% @@ -418,10 +418,11 @@
% if (error)
% return (error);
% - np->n_mtime = vattr.va_mtime.tv_sec;
% + np->n_mtime = vattr.va_mtime;
% } else {
% + np->n_attrstamp = 0;
^^^^^^^^^^^^^^^^^^^^
% error = VOP_GETATTR(vp, &vattr, ap->a_cred, ap->a_td);
% if (error)
% return (error);
% - if (np->n_mtime != vattr.va_mtime.tv_sec) {
% + if (NFS_TIMESPEC_COMPARE(&np->n_mtime, &vattr.va_mtime)) {
% if (vp->v_type == VDIR)
% np->n_direofoffset = 0;
and associated changes give silly behaviour that almost doubles the
number of Access RPCs. One of the associated changes clears n_attrstamp
on close(). Then on open(), since lookup() is called before the above
is reached, nfs_access_otw() has always just been called, and the above
forces another call.
Counting RPCs gives a good metric for the pessimizations. Removing the
above clearing in RELENG_6 gives the following improvement:
Before:
89.90 real 62.16 user 5.50 sys
Lookup Read Write Create Access Fsstat Setattr Other Total
60010 2410 5353 442 43785 1742 5194 6 118942
After:
86.46 real 62.22 user 5.21 sys
Lookup Read Write Create Access Fsstat Setattr Other Total
59986 2410 5353 442 20935 1742 5194 6 96068
Note the RPC delta-counts barely changed except for the Access one.
About 20000 Access calls were avoided. Just removing the clearing
is not correct but is close.
The pessimization in vfs_cache.c 1.103 is now easy to quantify. It
triples the number of Lookup RPCs. Removing it in addition to the
above gives a much larger improvement:
79.24 real 61.87 user 5.04 sys
Lookup Read Write Create Access Fsstat Setattr Other Total
19548 2410 5353 442 20922 1742 5194 6 55617
Note the RPC delta-counts barely changed except for the Lookup one.
About 40000 Lookup calls were avoided. Just removing the change in
vfs_cache.c 1.103 is not close to being correct.
The last major pessimization is another silly one. The changes to
mark atimes on exec() and mmap() cause a silly null Setattr RPC for
every exec() (more for interprters?) and every mmap(). This is
easy to fix (almost) correctly. VOP_SETATTR() is assumed to do
nothing for requests that it doesn't understand, but nfs_setattr()
does null RPCs instead. The following fix:
% diff -c2 ./nfsclient/nfs_vnops.c~ ./nfsclient/nfs_vnops.c
% *** ./nfsclient/nfs_vnops.c~ Sun Oct 8 23:08:57 2006
% --- ./nfsclient/nfs_vnops.c Fri Oct 13 09:58:12 2006
% ***************
% *** 669,675 ****
%
% /*
% ! * Setting of flags is not supported.
% */
% ! if (vap->va_flags != VNOVAL)
% return (EOPNOTSUPP);
%
% --- 677,684 ----
%
% /*
% ! * Setting of flags and marking of atimes are not supported.
% */
% ! if (vap->va_flags != VNOVAL ||
% ! ((bdefix & 4) && (vap->va_vaflags & VA_MARK_ATIME)))
% return (EOPNOTSUPP);
%
in addition to the removals gives the following improvement with
bdefix set to 7:
78.14 real 62.03 user 4.79 sys
Lookup Read Write Create Access Fsstat Other Total
19556 2410 5353 442 19581 1738 14 49094
Bruce
More information about the freebsd-fs
mailing list