RE: NFS 4.1 RECLAIM_COMPLETE FS failed error in combination with ESXi client

NAGY Andreas Andreas.Nagy at frequentis.com
Fri Mar 9 15:26:06 UTC 2018


>The attached patch changes BindConnectiontoSession to reply NFS4ERR_BAD_SESSION when the session doesn't exist. This might trigger recovery after a server reboot.
>This patch must be applied after bindconn.patch.

Works perfect! ESXi host reconnects to the datastore as soon as the nfsserver is available.

>I took a quick look at your packet trace and it appears that this client does all write FILESYNC. As such, setting vfs.nfsd.async=1 won't have any affect.
>If you apply the attached patch, it should change the FILESYNC->UNSTABLE so that vfs.nfsd.async=1 will make a difference. Again, doing this does put data at risk when the server crashes.

Yes, ESXi writes everything sync. With the patch + vfs.nfsd.async=1 (only for testing)  writes are a bit faster. 
If have not tuned anything on the fs settings so far (just formatted it as standard UFS), compared to multipath iSCSI with VMFS reads are a little bit faster, but writes are a bit slower.
That's just what I see from a simple ATTO benchmark from within a Windows VM, have not done any details IOP benchmark,...
The RAID controller on this machines does not support IT mode, but I think I will still use ZFS, but only as filesystem on a single hw raid disk. Must check what are the best setting for this on the hw raid + zfs for nfs.

>This one is a mystery to me. It seemed to be upset that the directory is changing (I assume either the Change or ModifyTime attributes). However, if entries are being deleted, the directory is changing and, as far as I know, the Change and ModifyTime attributes are supposed to change.
>I might try posting on nfsv4 at ietf.org in case somebody involved with this client reads that list and can explain what this is?

Maybe it is really just a bug in the VMware integrated host client browser. Deleting folders on the mounted datastore in the ESXi shell is no problem and does also not generate any warnings in the vmkernel.log.

>I'll take another look and see if I can guess why it doesn't like "2" as a reason for not issuing a delegation. (As noted before, I don't think this is serious, but???)

This warnings are still there, but don't seem to have any impact. It looks like they only appeare when files are created or modified on the datastore from the datastore browser or from shell, have not seen this warnings when working in a VM on a virtual disk that is stored on the nfs datastore.

>Yes, before I posted that I didn't understand why multiple TCP links would be faster.
>I didn't notice at the time that you mentioned using different subnets and, as such, links couldn't be trunked below TCP. In your case trunking above TCP makes sense.

Yes, actually I am working on a lab environment/testsys I often get some servers as leftovers from projects and there are also plenty of Cisco 1GB switches, but 10GBs are rare. I already did some tests with iSCSI multipathing but I prefer NFS and now with this I get also the same speed. 
As I have never seen a working setup with multiple paths for NFS, so I am really happy that you got this working in such a short time.

andi



More information about the freebsd-stable mailing list