FreeBSD NFS client goes into infinite retry loop
Steve Polyack
korvus at comcast.net
Mon Mar 22 15:47:46 UTC 2010
On 03/22/10 10:52, Steve Polyack wrote:
> On 3/19/2010 11:27 PM, Rick Macklem wrote:
>> On Fri, 19 Mar 2010, Steve Polyack wrote:
>>
>> [good stuff snipped]
>>>
>>> This makes sense. According to wireshark, the server is indeed
>>> transmitting "Status: NFS3ERR_IO (5)". Perhaps this should be STALE
>>> instead; it sounds more correct than marking it a general IO error.
>>> Also, the NFS server is serving its share off of a ZFS filesystem,
>>> if it makes any difference. I suppose ZFS could be talking to the
>>> NFS server threads with some mismatched language, but I doubt it.
>>>
>> Ok, now I think we're making progress. If VFS_FHTOVP() doesn't return
>> ESTALE when the file no longer exists, the NFS server returns whatever
>> error it has returned.
>>
>> So, either VFS_FHTOVP() succeeds after the file has been deleted, which
>> would be a problem that needs to be fixed within ZFS
>> OR
>> ZFS returns an error other than ESTALE when it doesn't exist.
>>
>> Try the following patch on the server (which just makes any error
>> returned by VFS_FHTOVP() into ESTALE) and see if that helps.
>>
>> --- nfsserver/nfs_srvsubs.c.sav 2010-03-19 22:06:43.000000000 -0400
>> +++ nfsserver/nfs_srvsubs.c 2010-03-19 22:07:22.000000000 -0400
>> @@ -1127,6 +1127,8 @@
>> }
>> }
>> error = VFS_FHTOVP(mp, &fhp->fh_fid, vpp);
>> + if (error != 0)
>> + error = ESTALE;
>> vfs_unbusy(mp);
>> if (error)
>> goto out;
>>
>> Please let me know if the patch helps, rick
>>
>>
> The patch seems to fix the bad behavior. Running with the patch, I
> see the following output from my patch (return code of nfs_doio from
> within nfsiod):
> nfssvc_iod: iod 0 nfs_doio returned errno: 70
>
> Furthermore, when inspecting the transaction with Wireshark, after
> deleting the file on the NFS server it looks like there is only a
> single error. This time there it is a reply to a V3 Lookup call that
> contains a status of "NFS3ERR_NOENT (2)" coming from the NFS server.
> The client also does not repeatedly try to complete the failed request.
>
> Any suggestions on the next step here? Based on what you said it
> looks like ZFS is falsely reporting an IO error to VFS instead of
> ESTALE / NOENT. I tried looking around zfs_fhtovp() and only saw
> returns of EINVAL, but I'm not even sure I'm looking in the right place.
Further on down the rabbit hole... here's the piece in zfs_fhtovp()
where it's kicking out EINVAL instead of ESTALE - the following patch
corrects the behavior, but of course also suggests further digging
within the zfs_zget() function to ensure that _it_ is returning the
correct thing and whether or not it needs to be handled there or within
zfs_fhtovp().
---
src-orig/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c
2010-03-22 11:41:21.000000000 -0400
+++ src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c
2010-03-22 16:25:21.000000000 -0400
@@ -1246,7 +1246,7 @@
dprintf("getting %llu [%u mask %llx]\n", object, fid_gen, gen_mask);
if (err = zfs_zget(zfsvfs, object, &zp)) {
ZFS_EXIT(zfsvfs);
- return (err);
+ return (ESTALE);
}
zp_gen = zp->z_phys->zp_gen & gen_mask;
if (zp_gen == 0)
More information about the freebsd-questions
mailing list