FreeBSD NFS client goes into infinite retry loop

Mon Mar 22 21:38:19 UTC 2010

On 03/22/10 13:39, John Baldwin wrote:
> On Monday 22 March 2010 12:44:04 pm Steve Polyack wrote:
>    
>> On 03/22/10 12:00, John Baldwin wrote:
>>      
>>> On Monday 22 March 2010 11:47:43 am Steve Polyack wrote:
>>>
>>>        
>>>> On 03/22/10 10:52, Steve Polyack wrote:
>>>>
>>>>          
>>>>> On 3/19/2010 11:27 PM, Rick Macklem wrote:
>>>>>
>>>>>            
>>>>>> On Fri, 19 Mar 2010, Steve Polyack wrote:
>>>>>>
>>>>>> [good stuff snipped]
>>>>>>
>>>>>>              
>>>>>>> This makes sense.  According to wireshark, the server is indeed
>>>>>>> transmitting "Status: NFS3ERR_IO (5)".  Perhaps this should be STALE
>>>>>>> instead; it sounds more correct than marking it a general IO error.
>>>>>>> Also, the NFS server is serving its share off of a ZFS filesystem,
>>>>>>> if it makes any difference.  I suppose ZFS could be talking to the
>>>>>>> NFS server threads with some mismatched language, but I doubt it.
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> Ok, now I think we're making progress. If VFS_FHTOVP() doesn't return
>>>>>> ESTALE when the file no longer exists, the NFS server returns whatever
>>>>>> error it has returned.
>>>>>>
>>>>>> So, either VFS_FHTOVP() succeeds after the file has been deleted, which
>>>>>> would be a problem that needs to be fixed within ZFS
>>>>>> OR
>>>>>> ZFS returns an error other than ESTALE when it doesn't exist.
>>>>>>
>>>>>> Try the following patch on the server (which just makes any error
>>>>>> returned by VFS_FHTOVP() into ESTALE) and see if that helps.
>>>>>>
>>>>>> --- nfsserver/nfs_srvsubs.c.sav    2010-03-19 22:06:43.000000000 -0400
>>>>>> +++ nfsserver/nfs_srvsubs.c    2010-03-19 22:07:22.000000000 -0400
>>>>>> @@ -1127,6 +1127,8 @@
>>>>>>            }
>>>>>>        }
>>>>>>        error = VFS_FHTOVP(mp,&fhp->fh_fid, vpp);
>>>>>> +    if (error != 0)
>>>>>> +        error = ESTALE;
>>>>>>        vfs_unbusy(mp);
>>>>>>        if (error)
>>>>>>            goto out;
>>>>>>
>>>>>> Please let me know if the patch helps, rick
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>> The patch seems to fix the bad behavior.  Running with the patch, I
>>>>> see the following output from my patch (return code of nfs_doio from
>>>>> within nfsiod):
>>>>> nfssvc_iod: iod 0 nfs_doio returned errno: 70
>>>>>
>>>>> Furthermore, when inspecting the transaction with Wireshark, after
>>>>> deleting the file on the NFS server it looks like there is only a
>>>>> single error.  This time there it is a reply to a V3 Lookup call that
>>>>> contains a status of "NFS3ERR_NOENT (2)" coming from the NFS server.
>>>>> The client also does not repeatedly try to complete the failed request.
>>>>>
>>>>> Any suggestions on the next step here?  Based on what you said it
>>>>> looks like ZFS is falsely reporting an IO error to VFS instead of
>>>>> ESTALE / NOENT.  I tried looking around zfs_fhtovp() and only saw
>>>>> returns of EINVAL, but I'm not even sure I'm looking in the right place.
>>>>>
>>>>>            
>>>> Further on down the rabbit hole... here's the piece in zfs_fhtovp()
>>>> where it's kicking out EINVAL instead of ESTALE - the following patch
>>>> corrects the behavior, but of course also suggests further digging
>>>> within the zfs_zget() function to ensure that _it_ is returning the
>>>> correct thing and whether or not it needs to be handled there or within
>>>> zfs_fhtovp().
>>>>
>>>> ---
>>>> src-orig/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c
>>>> 2010-03-22 11:41:21.000000000 -0400
>>>> +++ src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c
>>>> 2010-03-22 16:25:21.000000000 -0400
>>>> @@ -1246,7 +1246,7 @@
>>>>         dprintf("getting %llu [%u mask %llx]\n", object, fid_gen,
>>>>          
> gen_mask);
>    
>>>>         if (err = zfs_zget(zfsvfs, object,&zp)) {
>>>>             ZFS_EXIT(zfsvfs);
>>>> -        return (err);
>>>> +        return (ESTALE);
>>>>         }
>>>>         zp_gen = zp->z_phys->zp_gen&   gen_mask;
>>>>         if (zp_gen == 0)
>>>>
>>>>          
>>> So the odd thing here is that ffs_fhtovp() doesn't return ESTALE if
>>>        
> VFS_VGET()
>    
>>> (which calls ffs_vget()) fails, it only returns ESTALE if the generation
>>>        
> count
>    
>>> doesn't matter.
>>>
>>>
>>>        
>> It looks like it also returns ESTALE when the inode is invalid (<
>> ROOTINO ||>  max inodes?) - would an unlinked file in FFS referenced at
>> a later time report an invalid inode?
>>
>> But back to your point, zfs_zget() seems to be failing and returning the
>> EINVAL before zfs_fhtovp() even has a chance to set and check zp_gen.
>> I'm trying to get some more details through the use of gratuitous
>> dprintf()'s, but they don't seem to be making it to any logs or the
>> console even with vfs.zfs.debug=1 set.  Any pointers on how to get these
>> dprintf() calls working?
>>      
> That I have no idea on.  Maybe Rick can chime in?  I'm actually not sure why
> we would want to treat a FHTOVP failure as anything but an ESTALE error in the
> NFS server to be honest.
>
>    

I resorted to changing dprintf()s to printf()s.  The failure in 
zfs_fhtovp() is indeed from zfs_zget(), which fails right at the top 
where it calls dmu_bonus_hold():
Mar 22 16:55:44 zfs-dev kernel: zfs_zget(): dmu_bonus_hold() failed, 
returning err: 17
Mar 22 16:55:44 zfs-dev kernel: zfs_fhtovp(): zfs_zget() failed, bailing 
out with err: 17
errno 17 seems to map to EEXIST.

in zfs_zget():
         err = dmu_bonus_hold(zfsvfs->z_os, obj_num, NULL, &db);
         if (err) {
                 ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num);
                 printf("zfs_zget(): dmu_bonus_hold() failed, returning 
err: %d\n", err);
                 return (err);
         }
dmu_bonus_hold() calls dnode_hold_impl(), which seems to be who is 
returning EEXIST (17).  It's probably not kosher to modify such returns, 
so I suspect fixing this within the NFS server may be the only option.

Regardless, is it safe to just treat any other error from VFS_FHTOVP() 
as ESTALE within the NFS code?  This is what Rick's testing patch does, 
but it leaves me curious as to how it would act when other real errors 
returned from an _fhtovp() call.  Perhaps zfs_fhtovp() is the place to 
handle it, translating the error appropriately for the VFS_FHTOVP() 
caller within the NFS server code.  This could avoid any nasty side 
effects for NFS operations on other filesystems where things are already 
working "fine".  This is the point where I'm lost as to where to go next.

I'll try the experimental NFS server switches soon just to see if it 
handles it any better, but based on the findings so far I would think 
that it would act the same.