NFS server bottlenecks

Sun Oct 14 02:18:29 UTC 2012

Ivan Voras wrote:
> On 13 October 2012 23:43, Rick Macklem <rmacklem at uoguelph.ca> wrote:
> 
> > If, as you proposed, use separate LRU lists for each hash bucket,
> > then
> > how do you know if the least recently used for one hash backet isn't
> > much more recently used than the least recently used for another
> > hash
> > bucket? (The hash code is using xid, which might be about the same
> > for
> > different clients at the same time.)
> 
> I'm not that familiar with the code to judge: would that be a problem,
> other than a (seemingly slight) loss of efficiency?
> 
> Is there any other purpose to the LRU list except to help remove stale
> entries?
> I haven't done any real examination of how it works, but
> looking at the code in:
> 
> http://fxr.watson.org/fxr/source/fs/nfsserver/nfs_nfsdcache.c#L780
> 
> ... I don't see how the LRU property of the list actually helps
> anything (I.e. - would the correctness of the code be damaged if this
> was an orfinary list without the LRU property?)

The concept behind the DRC is (published in Usenix long ago, the reference
is in a comment in the code):
- When NFS is run over UDP, the client will wait for a reply from the
  server with a timeout. When there is a timeout, the client will resend
  the RPC request.
  - If the timeout occurs because the server was slow to reply (due to heavy
    load or ???) or the reply was lost by the network, this retransmit of
    the RPC request would result in the RPC being re-done on the server.
  - for idempotent RPCs (like read), this increases load on the server
  - for non-idempotent RPCs, this can result in corrupted data
- The DRC minimizes the likelyhood of this occurring, by caching replies
  for non-idempotent RPCs, so the server can reply from the cache instead
  of re-doing the RPC.

As such, cached replies need to be cached long enough, so that it is unlikely
that the server will be retrying the RPC. Unfortunately, there is no well
defined time limit, since retry timeout and network delay varies for
different clients.
Therefore, the server wants to hold onto the cached reply as long as possible.
This means that if you don't replace the least recently used cached reply,
you make the DRC less effective.

rick

> 
> >  ps: I hope you didn't mind me adding the mailing list. I'd like
> >  others to
> >    be able to comment/read the discussion.
> 
> For the others to catch up, I was proposing this approach to Rick:
> 
> http://people.freebsd.org/~ivoras/diffs/nfscache_lock.patch
> 
> (this patch is far from being complete, it's just a sketch of an
> idea). Basically, I'd like to break the global hash lock into
> per-bucket locks and to break the global LRU list into per-bucket
> lists, protected by the same locks.