vn_fullpath() again

Wed Sep 7 09:24:00 PDT 2005

:The NFS server problem I was referring to specifically came up in the case 
:of the DTE work is as follows, and can't be solved through modifications 
:to the name cache:
:
:- An NFS client uses lookups to retrieve the file handle of /foo/bar/baz
:   on an NFS server.  It sets the cwd of a process on the client to that
:   handle.
:
:- The NFS server reboots, flushing its name cache and other state.
:
:- The NFS client begins to look up and access files relative to the handle
:   of /foo/bar/baz, which is fine because it has a file handle.
:
:All further file access by the NFS client is done relative to directories 
:that don't appear in the name cache unless faulted in through further 
:access to that directory via the file system root, and access to existing 
:open files may be done without access to any directory at all.  It may 
:even happen if the file isn't open on the client but the path data is 
:cached in the client avoiding server access. This doesn't just apply to 
:direct remote access -- local access by file handle for rpc.lockd and 
:friends is similarly affected, such as access via fhopen() in order to 
:proxy locking operations.  In the case of DTE, the file server was 
:required to reconstruct a path to the directory and files, which in the 
:case of a pure directory access required a number of I/O's to walk up the 
:tree to the root.  In the case of simply accessing a file and not a 
:directory, it required an O(1) search of directories on the disk to find a 
:directory that referenced the file.  Both of these mechanisms fall 
:ratherly firmly into the "undesirable" category, but are necessary if you 
:want reliable and arbitrary vnode->path conversion.

    I reviewed the Dragonfly implementation and I was incorrect about 
    the NFS(v3) server being able to resolve the namecache/path for file 
    handles representing regular files.  It only can do it for directories,
    but since all namespace operations (lookup, rename, remove, etc)
    pass a directory file handle in the RPC, this is all the NFS server
    needs to satisfy DragonFly's VOP requirements for namespace operations.
    So in DragonFly it is still possible to have a dangling regular file
    vnode on the NFS server, but it is not possible for a dangling vnode to
    exist with regard to any namespace operation (since DragonFly is 
    required to pass namecache pointers for namespace VOPs like REMOVE,
    RENAME, LOOKUP, etc, and the directory vnode is available in all
    such cases).

    In anycase, the resolution case for directories is pretty trivial.  I
    have debugging code which prints out to the console whenever DFly has to
    walk the directory tree backwards to resolve a directory from fhopen,
    and most of the time the namecache entry is either already cached from
    prior operations, or the walk only has to scan one directory level 
    because namecache entries for higher levels are already cached from
    prior operations.  Even when working on a very large directory tree
    after a server reboot, the directory nodes in the namecache are very
    quickly repopulated and very quickly become optimal.  I tested that 
    too and it wasn't a big deal.  I consider the reverse resolution case
    for directories to be a solved problem.

    The regular file resolution issue for changes made by an NFS server
    is something I need to fix for our journaling code, to reduce
    the number of instances where the userland side of the journaling
    stream will have to index by the inode number.  I think the regular
    file case can be mostly solved by embedding the directory inode number
    in the file handle for any file that is looked up, at least for NFSv3.
    NFSv2 is a lost cause, the file handles simply are not big enough.

    Lookups would mask out the directory inode part of the handle (i.e.
    you wouldn't get ESTALE if the file happened to be renamed to another
    directory), but namepath resolution would be able to use the directory
    inode data as a hint.  This would result in nearly all such files
    being resolvable with the scan of only a single directory per file,
    and the other files encountered during the scan could be cached, as well.

:Hence my asking about who the consumer is, when they need the data, and 
:how timely the data needs to be. If you're only interested in the actions 
:of local processes, then the name cache can be a somewhat reliable source 
:of name data.  If you're also interested in identifying the actions of 
:remote consumers of files accessing by file handle, which may account for 
:the majority of activity on a file system on an NFS file server -- i.e., 
:for a server side access monitoring tool, then that may cause significant 
:difficulty.  This is because, as has been discussed extensively, paths are 
:traversals of a name space at a particular time, and not properties of 
:files themselves.
:
:Robert N M Watson

    Yes, I agree.  It just so happens that an audit or mirroring operation
    based on a stream of journaling data is also time-centric, so path
    information is exactly what those operations need for the most optimal
    operation.  An auditing program, especially, is going to be universally
    path-based.  The DragonFly journaling code tries to provide both path
    and inode number info for all transactions and it is the path info that
    makes the (userland) mirroring code optimal.

    I am going to look into the file handle issue.  At least for UFS and
    NFSv3, the file handle should be big enough to hold the directory inode
    used to originally lookup the file, to shortcut re-resolution of the
    namecache on the NFS server when that file handle is supplied at a later
    time.  All remaining cases (aka the file was renamed to another
    directory) would be non-optimal from the point of view of the
    consumer, meaning that the consumer would have to index the inodes in
    those cases to accomplish the reverse resolution.  For DragonFly, 
    the consumer in this case is the backend of a journaling stream, which
    is either going to be a mirror or an auditing program.  The goal would
    be to greatly reduce the instances where such programs have to index
    and look up a path based on an inode number.

    Another possible augmentation for the NFS server side which would
    completely guarentee our ability to reserve-resolve the path ON the
    NFS server is to have the NFS server keep track of any inode number 
    which has been renamed to a different directory.  It would have to
    keep track of (newdirectory_inode, inode_number) in a persistent file
    to survive reboots.  The data structure would be collapsable to handle
    cases where many inodes are renamed... the main purpose would simply be
    to return a limited set of directory inodes that might need to be scanned
    to resolve a file handle containing an inode which cannot be found in
    the original directory it was opened in.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>