[Bug 249567] NFSv4 server sometimes responds with NFSERR_INVAL to LOCK from Linux clients
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Thu Sep 24 09:39:09 UTC 2020
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=249567
Bug ID: 249567
Summary: NFSv4 server sometimes responds with NFSERR_INVAL to
LOCK from Linux clients
Product: Base System
Version: 11.4-RELEASE
Hardware: amd64
OS: Any
Status: New
Severity: Affects Some People
Priority: ---
Component: kern
Assignee: bugs at FreeBSD.org
Reporter: bf at cebitec.uni-bielefeld.de
Attachment #218237 text/plain
mime type:
Created attachment 218237
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=218237&action=edit
fcntl F_SETLK test case
We run an NFSv4 server with ZFS as backing store based on 11.4-RELEASE-p3.
Clients are Solaris and Linux mostly. Sometimes "svn checkout" or "svn status"
fails with a disk I/O error on NFS volumes on Linux clients. We tracked down
the problem to sqlite which quite intensively uses fcntl locking. From this, we
created a small test case which simply does
res = fcntl(fd, F_SETLK, &lock);
[...]
res = fcntl(fd, F_SETLK, &unlock);
in a loop (see attachment). This reliably triggers the problem:
[me at linuxhost:~]$ fcntl_setlk
F_SETLK F_RDLCK res = -1 (Invalid argument)
Successful cycles: 431
So here fcntl fails after 431 succesful lock-unlock cycles. On the wire, we can
see the LOCK request as:
Opcode: LOCK (12)
locktype: READ_LT (1)
reclaim?: No
offset: 0
length: 1
new lock owner?: Yes
seqid: 0x00000000
StateID
[StateID Hash: 0xdf2d]
StateID seqid: 1
StateID Other: d92b565f140000008c0c0000
[StateID Other hash: 0x15f7]
lock_seqid: 0x00000000
Owner
clientid: 0xd92b565f14000000
owner: <DATA>
length: 20
contents: <DATA>
Lock requests that succeed exactly look the same. On a fail case the FreeBSD
NFS server replies:
Opcode: LOCK (12)
Status: NFS4ERR_INVAL (22)
Using DTrace, we found the source of the NFS4ERR_INVL in nfsrv_lockctrl() at
nfs_nfsdstate.c:1810:
if (!error)
nfsrv_getowner(&stp->ls_open, new_stp, &lckstp);
if (lckstp)
/*
* I believe this should be an error, but it
* isn't obvious what NFSERR_xxx would be
* appropriate, so I'll use NFSERR_INVAL for now.
*/
error = NFSERR_INVAL;
else
lckstp = new_stp;
As a workaround we tried to simply comment out the setting of "error". With
this change, the test case no longer triggers the problem:
--- nfs_nfsdstate.c 2020/09/23 12:58:37 1.1
+++ nfs_nfsdstate.c 2020/09/23 14:16:19
@@ -1802,12 +1802,17 @@
if (!error)
nfsrv_getowner(&stp->ls_open, new_stp, &lckstp);
if (lckstp)
+#ifdef DIAGNOSTIC
+ printf("nfs_nfsdstate.c:1805: I believe this should
be an error\n");
+#else
+ ;
+#endif
/*
* I believe this should be an error, but it
* isn't obvious what NFSERR_xxx would be
* appropriate, so I'll use NFSERR_INVAL for now.
- */
error = NFSERR_INVAL;
+ */
else
lckstp = new_stp;
} else if (new_stp->ls_flags&(NFSLCK_LOCK|NFSLCK_UNLOCK)) {
While this seems to work, I have a gut feeling that lckstp should be new_stp
(unconditionally) instead of what nfsrv_getowner returns. Someone with a deeper
understandig of the NFS specification should look into this.
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the freebsd-bugs
mailing list