[Bug 249567] NFSv4 server sometimes responds with NFSERR_INVAL to LOCK from Linux clients

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Thu Sep 24 09:39:09 UTC 2020


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=249567

            Bug ID: 249567
           Summary: NFSv4 server sometimes responds with NFSERR_INVAL to
                    LOCK from Linux clients
           Product: Base System
           Version: 11.4-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs at FreeBSD.org
          Reporter: bf at cebitec.uni-bielefeld.de
 Attachment #218237 text/plain
         mime type:

Created attachment 218237
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=218237&action=edit
fcntl F_SETLK test case

We run an NFSv4 server with ZFS as backing store based on 11.4-RELEASE-p3.
Clients are Solaris and Linux mostly. Sometimes "svn checkout" or "svn status"
fails with a disk I/O error on NFS volumes on Linux clients. We tracked down
the problem to sqlite which quite intensively uses fcntl locking. From this, we
created a small test case which simply does

  res = fcntl(fd, F_SETLK, &lock);
  [...]
  res = fcntl(fd, F_SETLK, &unlock);

in a loop (see attachment). This reliably triggers the problem:

  [me at linuxhost:~]$ fcntl_setlk
  F_SETLK F_RDLCK res = -1 (Invalid argument)
  Successful cycles: 431

So here fcntl fails after 431 succesful lock-unlock cycles. On the wire, we can
see the LOCK request as:

  Opcode: LOCK (12)
      locktype: READ_LT (1)
      reclaim?: No
      offset: 0
      length: 1
      new lock owner?: Yes
      seqid: 0x00000000
      StateID
          [StateID Hash: 0xdf2d]
          StateID seqid: 1
          StateID Other: d92b565f140000008c0c0000
          [StateID Other hash: 0x15f7]
      lock_seqid: 0x00000000
      Owner
          clientid: 0xd92b565f14000000
          owner: <DATA>
              length: 20
              contents: <DATA>

Lock requests that succeed exactly look the same. On a fail case the FreeBSD
NFS server replies:

  Opcode: LOCK (12)
      Status: NFS4ERR_INVAL (22)

Using DTrace, we found the source of the NFS4ERR_INVL in nfsrv_lockctrl() at
nfs_nfsdstate.c:1810:

    if (!error)
        nfsrv_getowner(&stp->ls_open, new_stp, &lckstp);
    if (lckstp)
        /*  
         * I believe this should be an error, but it
         * isn't obvious what NFSERR_xxx would be
         * appropriate, so I'll use NFSERR_INVAL for now.
         */  
        error = NFSERR_INVAL;
    else
        lckstp = new_stp;

As a workaround we tried to simply comment out the setting of "error". With
this change, the test case no longer triggers the problem:

--- nfs_nfsdstate.c     2020/09/23 12:58:37     1.1
+++ nfs_nfsdstate.c     2020/09/23 14:16:19
@@ -1802,12 +1802,17 @@
                        if (!error)
                           nfsrv_getowner(&stp->ls_open, new_stp, &lckstp);
                        if (lckstp)
+#ifdef DIAGNOSTIC
+                          printf("nfs_nfsdstate.c:1805: I believe this should
be an error\n");
+#else
+                          ;
+#endif
                           /*
                            * I believe this should be an error, but it
                            * isn't obvious what NFSERR_xxx would be
                            * appropriate, so I'll use NFSERR_INVAL for now.
-                           */
                           error = NFSERR_INVAL;
+                           */
                        else
                           lckstp = new_stp;
                   } else if (new_stp->ls_flags&(NFSLCK_LOCK|NFSLCK_UNLOCK)) {

While this seems to work, I have a gut feeling that lckstp should be new_stp
(unconditionally) instead of what nfsrv_getowner returns. Someone with a deeper
understandig of the NFS specification should look into this.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list