NFS on NFS?

Tue Jul 17 18:59:59 UTC 2007

On Tue, 17 Jul 2007, Eric Anderson wrote:

> Rick Macklem wrote:
>
> Is that really true?  It looked like the NFS handle was created by various 
> file system goo, which could come up again some time in the future.  For 
> instance, file a file systems inode table, rm all the files, do it again 
> (with different data in the files).  Wouldn't the NFS handle look the same to 
> the client then, but be a different file?  Or when we say 'file' do we mean 
> 'inode' on a file system?
>
The file handle also has di_gen (the generation #) in it, which is there
specifically to prevent the file handle from accidentally referring to a
new file with the same i-node #. The server is expected to return ESTALE
when a client tries to use a file handle after the file is deleted and
this error is returned when the generation# in the file handle is not the
same as di_gen in the i-node. (di_gen is incremented each time the i-node
is re-used.) File systems that do not have the equivalent of di_gen cannot
be exported via NFS correctly (but some people/systems do so anyhow). Ok
if the file system is read-only.

> Also, by 'T stable', does 'T' mean 'time' here?
Yep. Capital T for a looonnngggg time.

> I'm not certain I completely understand why the clients would get confused. 
> Wouldn't it look something like this:
>
> [File system->NFS server->NFS handle]
>               |
>               V
> [NFS client->virtual file system->NFS server->NFS handle2]
>               |
>               V
> [NFS Client->virtual file system->application]
>
So long as the intermediate server obeys all the rules, it can work:
- File Handle is T-stable (recognized as ESTALE after the file is deleted)
   and still works the same after server reboots, etc.
- fsid in getattr remains the same throughout the file system, even after
   server reboots, etc.
- handles RPCs in an atomic way, so that they are either done or not
   (can't leave things half created after a crash)
   - NFSv2 and v3 clients don't expect servers to maintain any state
     and don't know the server rebooted. They simply retry the RPC until
     they get success or failure back from the server.

Where these schemes usually break down is when the intermediate server
reboots and no longer does the same file handle translations or assigns
a new, different fsid to the file system or crosses a mount point
boundary and changes the fsid or ???

Like I said, seems like a simple proxy that passes along the RPCs is
easier to do. For NFSv3 (not v2) the intermediary can grow the size of
the file handle (to a maximum of 64 bytes) so, if the real server creates
file handles less than 64 bytes in size, it can add/remove stuff, but...
- it then becomes useful for only certain servers
- it has to do lots of copying of args, since the size changes

rick