NFS 4.1 RECLAIM_COMPLETE FS failed error

Rick Macklem rmacklem at uoguelph.ca
Mon Jul 9 02:10:44 UTC 2018


Daniel Engel wrote:
[stuff snipped]
>I traced the commits that Rick has made since that thread and merged them 'head' >into 'stable':
>
>    'svnlite checkout http://svn.freebsd.org/base/release/11.1.0/'
>    'svnlite merge -c 332790 http://svn.freebsd.org/base/head'
>    'svnlite merge -c 333508 http://svn.freebsd.org/base/head'
>    'svnlite merge -c 333579 http://svn.freebsd.org/base/head'
>    'svnlite merge -c 333580 http://svn.freebsd.org/base/head'
>    'svnlite merge -c 333592 http://svn.freebsd.org/base/head'
>    'svnlite merge -c 333645 http://svn.freebsd.org/base/head'
>    'svnlite merge -c 333766 http://svn.freebsd.org/base/head'
>    'svnlite merge -c 334396 http://svn.freebsd.org/base/head'
>    'svnlite merge -c 334492 http://svn.freebsd.org/base/head'
>    'svnlite merge -c 327674 http://svn.freebsd.org/base/head'
Yes, you have all the commits to head related to the 4.1 server that might affect
the ESXi client, plus a bunch that should be harmless, but I don't think affect
the ESXi client mounts. (Most of these will get MFC'd to stable/11, but I haven't
gotten around to it yet.)

The ones that might be in 6.7 (they were in 6.5) that may bite you are:
- The client does an OpenDownGrade with all OPEN_SHARE_ACCESS and
   OPEN_SHARE_DENY bits set for something it calls a "drive lock".
  (Adding bits is supposed to be done via an Open/ClaimNull and not
   OpenDowngrade.) I'd really like to know if this still happens for 6.7?
- Something about "directory modified too often" when doing deletion of a bunch
  of files. (I have no idea what this one means, but apparently it was seen for
  other NFSv4.1 servers.)
- Some warnings about "wrong reason for not issuing a delegation". I have a fix
  for this one in PR#226650, but they are just warnings and don't seem to
  matter much.

The rest of the really nasty stuff happens after a server reboot. The recovery code
seemed to be badly broken in the 6.5 client. (All sorts of fun stuff like the client
looping doiing ExchangeID operations forever. VM crashes...)

>That completely fixed the connection instability, but the NFS share was still mounting >read-only with a RECLAIM_COMPLETE error.  So, I manually applied the first patch >from the previous thread and everything started working:
>
>    --- fs/nfsserver/nfs_nfsdserv.c.savrecl     2018-02-10 20:34:31.166445000 -0500
>    +++ fs/nfsserver/nfs_nfsdserv.c     2018-02-10 20:36:07.947490000 -0500
>    @@ -4226,10 +4226,9 @@ nfsrvd_reclaimcomplete(struct nfsrv_desc
>            goto nfsmout;
>        }
>        NFSM_DISSECT(tl, uint32_t *, NFSX_UNSIGNED);
>    +   nd->nd_repstat = nfsrv_checkreclaimcomplete(nd);
>        if (*tl == newnfs_true)
>    -           nd->nd_repstat = NFSERR_NOTSUPP;
>    -   else
>    -           nd->nd_repstat = nfsrv_checkreclaimcomplete(nd);
>    +           nd->nd_repstat = 0;
I think this patch is ok to use, since no other extant client does a ReclaimComplete
with "one_fs == true". It does kinda violate the RFC.
The problem is that FreeBSD exports a hierarchy of file systems and telling the
server that one of them has been reclaimed is useless. (This hack just assumes
the client meant to say "one_fs == false".)
There was also a case (I think it was after a server reboot) where the client would
do one of these after doing a ReclaimComplete with "one_fs == false" and that is
definitely bogus (the server would reply NFS4ERR_ALREADY_COMPLETE without
the above hack) since the "one_fs == false" operation means all file systems have
been reclaimed.

Anyhow, once I get some packet traces from Andreas for 6.7, I'll try and figure
out how to handle at least some of the outstanding issues.

Good luck with it, rick



More information about the freebsd-stable mailing list