lost dotdot caching pessimizes nfs especially
Scott Long
scottl at samsco.org
Thu Oct 5 16:28:36 PDT 2006
Bruce Evans wrote:
> This change:
>
> % Index: vfs_cache.c
> % ===================================================================
> % RCS file: /home/ncvs/src/sys/kern/vfs_cache.c,v
> % retrieving revision 1.102
> % retrieving revision 1.103
> % diff -u -2 -r1.102 -r1.103
> % --- vfs_cache.c 13 Jun 2005 05:59:59 -0000 1.102
> % +++ vfs_cache.c 17 Jun 2005 01:05:13 -0000 1.103
> % @@ -494,6 +494,16 @@
> % return;
> % }
> % + /*
> % + * For dotdot lookups only cache the v_dd pointer if the
> % + * directory has a link back to its parent via v_cache_dst.
> % + * Without this an unlinked directory would keep a soft
> % + * reference to its parent which could not be NULLd at
> % + * cache_purge() time.
> % + */
> % if (cnp->cn_namelen == 2 && cnp->cn_nameptr[1] == '.') {
> % - dvp->v_dd = vp;
> % + CACHE_LOCK();
> % + if (!TAILQ_EMPTY(&dvp->v_cache_dst))
> % + dvp->v_dd = vp;
> % + CACHE_UNLOCK();
> % return;
> % }
>
> is responsible for about half of the performance loss since RELENG_4
> for building kernels over nfs (/usr and sys trees on nfs). The kernel
> build uses "../../" a lot, and the above change apparently results in
> lots of network activity for things that should be cached locally.
>
> Some times for building a RELENG_4 kernel under conditions invariant
> except for the host kernel (after "make clean; sleep 2; make depend;
> make; make clean; sleep 2; make depend" to warm up caches):
>
> kernel:
> RELENG_4 77.51 real 60.62 user 4.36 sys
> current.2004.07.01 ~78.5 (lost details)
> current.2005.01.01 ~79 (lost details)
> current.2005.06.17 82.42 real 62.50 user 4.71 sys
> current.2005.06.19 89.53 real 62.18 user 5.44 sys
> current.2005.06.17+ ~89.5 (lost details)
> .17+ = .17 plus above change
> current.2005.06.17+* 86.08 real 62.43 user 5.13 sys
> .17+* = .17+ with ../.. in Makefile avoided using a symlink
> @ -> <path to sys not using ..>
> RELENG_6 91.14 real 62.04 user 5.71 sys
> current similar to RELENG_6 (lost details)
>
> The total performance loss is about 18%.
>
> The total performance loss for a local sys tree (/usr still on nfs) is much
> smaller (about 4%):
>
> RELENG_4 65.19 real 60.50 user 3.95 sys
> current.2005.06.17 67.49 real 62.13 user 4.27 sys
> RELENG_6 67.83 real 61.84 user 4.71 sys
> current similar to RELENG_6 (lost details)
>
> The nfs performance for building of things that should be entirely
> cached locally is very dependent on network latency. Not caching
> things very well causes lots of unnecessary network traffic for Getattr
> and Lookup. The packets are small, so throughput is unimportant and
> latency dominates. For building over nfs without -j, the dead time
> (real - user - sys) is almost directly proportional to the latency.
> My usual local network has fairly low latency (~100uS unloaded) and
> the ~14 seconds dead time in the above is for it. Switching to a 1
> Gbps network with lower quality NICs gives an unloaded latency of ~160uS
> and a dead time of ~21 seconds. Building with -j helps even for UP,
> at the cost of extra CPU, by letting some processes advance using cached
> stuff while others are waiting for the network. Building with -j helps
> even more on FreeBSD cluster machines, more because they have a much
> higher network latency than because they are SMP.
>
> Bruce
I was starting to look at this a while ago, but had to move onto other
things. Do you have any suggestions for a fix?
Scott
More information about the freebsd-fs
mailing list