negative cache hits for nfs (was: cvs commit: ...)

Bruce Evans bde at zeta.org.au
Thu Oct 19 04:10:10 PDT 2006


On Wed, 18 Oct 2006, Chuck Lever wrote:

> On 10/18/06, Bruce Evans <bde at zeta.org.au> wrote:
>> Here is a merge of some bits from NetBSD for review.  It is mostly the
>> 1997 version, with updates to use timespecs instead of time_t's, but
>> not updates to use changes that don't seem to be related to correctness,
>> or ones less than 18 months old (if any).

>> ...
>> % +             if (error == ENOENT) {
>> % +                     /* Negative cache hit.  Use it unless stale. */
>> % +                     if (VOP_GETATTR(dvp, &vattr, cnp->cn_cred, td) == 0 
>> &&
>> % +                         timespeccmp(&vattr.va_mtime, &np->n_nctime, 
>> ==))
>> % +                             return (ENOENT);

> Now that you've verified the positive performance impact of this
> change, you should start by testing this with the Connectathon test
> suite (both NFSv2 and NFSv3).  Another useful test in this case is to
> observe how the client behaves in the face of replaced file objects.

Alas, I thought of a large problem without running any tests, ...

> Use another client to remove and recreate a file that your test client
> already has cached, or in this case create first after your client has
> already cached the negative lookup.  Another interesting test case is

... and before reading this carefully.  The directory timestamp (written
by VOP_GETTATTR() into vattr.va_mtime in the above) is normally cached.
Thus it can be stale, and then the negative cache entry can be stale
too but gets used.  With the default min dir. attr timeout of 30seconds
(BTW, why is this much larger than the min of 3 secondzs for non-dirs?),
the problem is easy to see using shell commands typed not very quickly.
I can't see how to fix this without an RPC to refresh the directory
attributes, and this defeats the point of reducing RPCs.  So the
negative cache optimization works best when combined with the no-ctoc
optimization -- then consistency for negative hits is no worse than
for positive ones.

Maybe a refresh for only the second-last component of the path would
be enough (also, only if the lookup is for open) would be good enough.
For positive dircache hits, directory attributes are normally never
refreshed until the timeout, so everything above the last component
of the path can change without us noticing and it is only the forced
refresh for the last component on open that gives us a chance to see
that the file has gone, moved or changed.  For negative dircache hits,
we want similar semantics for opens but don't have a vp for the last
component to refresh.  Second-lastness is harder to determine.

> Have you reviewed the NetBSD change logs for any fixes in this area since 
> 1997?

Some:
- use a timespec instead of a time_t for the timestamp, since we already
   do that for related timestamps
- don't get the comparison logic for the timespec backwards for 8 days
- not merged: some distributed changes involving updating timestamps more
   often since those seem to be only optimizations and I'm not sure if they
   are safe.

I now have some code for moving the attribute cache flush for
close-to-open-consistency to the start of open() and removing it
elsehwere.  This was simple except for debugging and safety-belt
parts, since there are already suitable flags for namei().  It
interacts a bit with negative caching so I'll discuss it in the
same thread now.

% Index: nfs_vnops.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/nfsclient/nfs_vnops.c,v
% retrieving revision 1.270
% diff -u -2 -r1.270 nfs_vnops.c
% --- nfs_vnops.c	14 Oct 2006 07:25:11 -0000	1.270
% +++ nfs_vnops.c	19 Oct 2006 00:03:23 -0000
% @@ -1,2 +1,4 @@
% +int nfs_ctoc = 1;		/* Bogus global enable for cto consistency. */
% +
%  /*-
%   * Copyright (c) 1989, 1993
% @@ -462,5 +464,7 @@
%  		if (error == EINTR || error == EIO)
%  			return (error);
% +		ASSERT_VOP_ELOCKED(vp, "nfs_open");
%  		np->n_attrstamp = 0;
% +		np->n_attrstampco = FALSE;
%  		if (vp->v_type == VDIR)
%  			np->n_direofoffset = 0;

Locking in nfs is dubious.  We mostly hold the exclusive vnode lock and
shouldn't need much mutex locking, but it now has a lot.  The ex.vn.lock
is enough for my new flag, and the above code already holds the mutex
for np so n_attrstamp has more than enough locking.

% @@ -472,5 +476,23 @@
%  		mtx_unlock(&np->n_mtx);
%  	} else {
% -		np->n_attrstamp = 0;
% +		ASSERT_VOP_ELOCKED(vp, "nfs_open");
% +		if (nfs_ctoc && !np->n_attrstampco) {
% +			/*
% +			 * XXX this should not be reached.  It defends
% +			 * against callers of namei() neglecting to set
% +			 * ISOPEN for _all_ opens and against open()
% +			 * dropping the exclusive vnode lock between
% +			 * nfs_lookup() and here.  Note that there can
% +			 * be an nfs_create() call between nfs_open()
% +			 * and here, and that the main benefit of
% +			 * ensuring the attribute refresh earlier than
% +			 * here is for the creation case.  If the lock
% +			 * does get dropped then we should use a timeout
% +			 * here.
% +			 */
% +			printf("nfs_open: missing attrcache refresh\n");
% +			np->n_attrstamp = 0;
% +		}
% +		np->n_attrstampco = FALSE;
%  		mtx_unlock(&np->n_mtx); 
%  		error = VOP_GETATTR(vp, &vattr, ap->a_cred, ap->a_td);

This hasn't been reached lately but took a long time to get right.  When
it is reached, it is equivalent to the old code except for its printf().
It refreshes the attributes just after they have been used, but since
we refetch here there are only minor problems.

% @@ -582,11 +604,4 @@
%  		mtx_lock(&np->n_mtx);
%  	    }
% - 	    /* 
% - 	     * Invalidate the attribute cache in all cases.
% - 	     * An open is going to fetch fresh attrs any way, other procs
% - 	     * on this node that have file open will be forced to do an 
% - 	     * otw attr fetch, but this is safe.
% - 	     */
% -	    np->n_attrstamp = 0;
%  	    if (np->n_flag & NWRITEERR) {
%  		np->n_flag &= ~NWRITEERR;

I think this can be safely removed without any other changes.  We need
to refresh on open(), and scheduling this refresh on close() doesn't
really help.  It doesn't do anything useful if it is on a different
client to the open(), and on the same client it usually just gives an
unnecessary RPC.

% @@ -852,8 +867,24 @@
%  		return (error);
%  	}
% -	if ((error = cache_lookup(dvp, vpp, cnp)) && error != ENOENT) {
% +	if ((error = cache_lookup(dvp, vpp, cnp)) != 0) {
%  		struct vattr vattr;
% 
% +		if (error == ENOENT) {
% +			/* Negative cache hit.  Use it unless stale. */
% +			if (VOP_GETATTR(dvp, &vattr, cnp->cn_cred, td) == 0 &&
% +			    timespeccmp(&vattr.va_mtime, &np->n_nctime, ==))
% +				return (ENOENT);
% +
% +			cache_purge(dvp);
% +			timespecclear(&np->n_nctime);
% +			goto dorpc;
% +		}

Negative cache hit code, same as before.

%  		newvp = *vpp;
% +		if (nfs_ctoc &&
% +		    (flags & (ISLASTCN | ISOPEN)) == (ISLASTCN | ISOPEN)) {
% +			ASSERT_VOP_ELOCKED(vp, "nfs_lookup");
% +			VTONFS(newvp)->n_attrstamp = 0;
% +			VTONFS(newvp)->n_attrstampco = TRUE;
% +		}
%  		if (!VOP_GETATTR(newvp, &vattr, cnp->cn_cred, td)
%  		 && vattr.va_ctime.tv_sec == VTONFS(newvp)->n_ctime) {

This is for a positve vfs cache hit.  Lookups for open() are easily
detected.  I didn't bother locking np for the access to n_attrstamp.
n_attrstampco records that we refreshed the attributes so nfs_open()
shouldn't do it again.  This is mainly a safety belt.  vnode_open_cred()
seems to be exclusively locked almost throughout, so we can safely
pass n_attrstampco to nfs_open() like this, but with exclusive locking
we can almost guarantee that nfs_open() doesn't need to refresh.

% @@ -871,4 +902,5 @@
%  		*vpp = NULLVP;
%  	}
% +dorpc:
%  	error = 0;
%  	newvp = NULLVP;

Negative cache hit code, same as before.

% @@ -935,4 +967,10 @@
%  		newvp = NFSTOV(np);
%  	}
% +	if (nfs_ctoc &&
% +	    (flags & (ISLASTCN | ISOPEN)) == (ISLASTCN | ISOPEN)) {
% +		ASSERT_VOP_ELOCKED(vp, "nfs_lookup");
% +		VTONFS(newvp)->n_attrstamp = 0;
% +		VTONFS(newvp)->n_attrstampco = TRUE;
% +	}
%  	if (v3) {
%  		nfsm_postop_attr(newvp, attrflag);

This is for a remote lookup.  I think the attributes haven't been fetched
yet so clearing n_attrstamp is not needed.

% @@ -951,4 +989,11 @@
%  nfsmout:
%  	if (error) {
% +		if (error == ENOENT && (flags & MAKEENTRY) &&
% +		    cnp->cn_nameiop != CREATE) {
% +			/* Negative cache entry. */
% +			if (!timespecisset(&np->n_nctime))
% +				np->n_nctime = np->n_vattr.va_mtime;
% +			cache_enter(dvp, NULL, cnp);
% +		}
%  		if (newvp != NULLVP) {
%  			vput(newvp);

Negative cache hit code, same as before.

% @@ -1440,4 +1485,12 @@
%  			cache_enter(dvp, newvp, cnp);
%  		*ap->a_vpp = newvp;
% +		if (nfs_ctoc) {
% +			/*
% +			 * XXX this assumes that we will be followed by
% +			 * nfs_open() immediately.
% +			 */
% +			ASSERT_VOP_ELOCKED(vp, "nfs_create");
% +			VTONFS(newvp)->n_attrstampco = TRUE;
% +		}
%  	}
%  	mtx_lock(&(VTONFS(dvp))->n_mtx);

This is in nfs_create().  Creation is interesting.  nfs_lookup() fails,
and if this was due to a negative cache hit the directory timestamp
had better be right.  Then vn_open_cred() calls here.  We don't have
a vp until near here so we can't set n_attrstampco until here.
Optimization of creation is the main possible optimization for the
ctoc case.  When we have just created the file, it is silly for
nfs_open() to force an attribute refresh a few microseconds later.  (I
haven't checked whether the file's cached attributes are ones we just
wrote or ones we just refreshed after creation, and hope that whatever
they are at this point is good enough.)  This optimization reduces the
number of Access RPCs for my kernel build benchmark by about 10% (from
~24000 to ~22000).  Many of the negative cache hits for this benchmark
are probably caused by creations, since the benchmark starts with the
"make clean; ...; make depend" so the "make depend" part creates lots
of negative cache hits for the files just removed.  However, an incorrect
negative cache hit seems most dangerous when the file is being created,
so parts of the optimization are invalid -- the current and previous
implementations of ctoc only refresh the attributes after possibly
clobbering the old ones.

% @@ -1931,6 +1984,9 @@
%  		if (newvp)
%  			vput(newvp);
% -	} else
% +	} else {
% +		if (cnp->cn_flags & MAKEENTRY)
% +			cache_enter(dvp, newvp, cnp);
%  		*ap->a_vpp = newvp;
% +	}
%  	return (error);
%  }

Part of negative cache hit code, same as before, but it is for future
positve cache hits (for the directory crwated by mkdir()).

% Index: nfsnode.h
% ===================================================================
% RCS file: /home/ncvs/src/sys/nfsclient/nfsnode.h,v
% retrieving revision 1.59
% diff -u -2 -r1.59 nfsnode.h
% --- nfsnode.h	13 Sep 2006 18:39:09 -0000	1.59
% +++ nfsnode.h	18 Oct 2006 10:09:19 -0000
% @@ -95,4 +95,5 @@
%  	struct vattr		n_vattr;	/* Vnode attribute cache */
%  	time_t			n_attrstamp;	/* Attr. cache timestamp */
% +	int			n_attrstampco;	/* n_attrs. cleared on open */
%  	u_int32_t		n_mode;		/* ACCESS mode cache */
%  	uid_t			n_modeuid;	/* credentials having mode */
% @@ -100,4 +101,5 @@
%  	struct timespec		n_mtime;	/* Prev modify time. */
%  	time_t			n_ctime;	/* Prev create time. */
% +	struct timespec		n_nctime;	/* Last neg cache entry (dir) */
%  	time_t			n_expiry;	/* Lease expiry time */
%  	nfsfh_t			*n_fhp;		/* NFS File Handle */

Bruce


More information about the freebsd-fs mailing list