FreeBSD NFS client goes into infinite retry loop

Mon Mar 22 23:40:27 UTC 2010

On Mon, 22 Mar 2010, John Baldwin wrote:

>> It looks like it also returns ESTALE when the inode is invalid (<
>> ROOTINO || > max inodes?) - would an unlinked file in FFS referenced at
>> a later time report an invalid inode?
>>

I'm no ufs guy, but the only way I can think of is if the file system
on the server was newfs'd with fewer i-nodes? (Unlikely, but...)
(Basically, it is safe to return ESTALE for anything that is not
  a transient failure that could recover on a retry.)

>> But back to your point, zfs_zget() seems to be failing and returning the
>> EINVAL before zfs_fhtovp() even has a chance to set and check zp_gen.
>> I'm trying to get some more details through the use of gratuitous
>> dprintf()'s, but they don't seem to be making it to any logs or the
>> console even with vfs.zfs.debug=1 set.  Any pointers on how to get these
>> dprintf() calls working?

I know diddly (as in absolutely nothing about zfs).
>
> That I have no idea on.  Maybe Rick can chime in?  I'm actually not sure why
> we would want to treat a FHTOVP failure as anything but an ESTALE error in the
> NFS server to be honest.
>
As far as I know, only if the underlying file system somehow has a 
situation where the file handle can't be translated at that point in time, 
but could be able to later. I have no idea if any file system is like that 
and I don't such a file system would be an appropriate choice for an NFS 
server, even if such a beast exists. (Even then, although FreeBSD's client 
assumes EIO might recover on a retry, that isn't specified in any RFC, as 
far as I know.)

That's why I proposed a patch that simply translates all VFS_FHTOVP()
errors to ESTALE in the NFS server. (It seems simpler than chasing down 
cases in all the underlying file systems?)

rick, chiming in:-)