Linux NFSv4 clients: bad sequence-id errors.

Tue Feb 27 22:54:05 UTC 2018

Ruben wrote:
>I'm experiencing a strange issue on a machine providing a couple of
>nfsv4 exports. A Linux client that generates a lot of traffic to and
>from the nfs server sometimes starts throwing "bad sequence-id errors":
>
>Feb 27 10:39:42 localhost kernel: [12481477.608103] NFS: v4 server
>returned a bad sequence-id error on an unconfirmed sequence 80f7d0d0!
The handling of sequence-id in NFSv4.0 is complex and I won't even try
to guess why this is happening.
I am surprised that your Linux mounts are using NFSv4.0 and not NFSv4.1?
(Usually Linux uses the most recent version supported by the server.)
I mention this since "sessions" replaced the sequence-id stuff in NFSv4.1
and, as such, shouldn't have such an issue.

>They typically occur after a couple of months of uptime on the nfsd
>machine. Every couple of seconds they are thrown by the client. The
>situation is "remedied" by restarting the nfsd on the server. Although
>functionality on the specific client does not appear to be affected
>(much?), its a bit disturbing. I've done some digging and found :
The fact that this is fixed by restarting the nfsd suggests a client side
problem.
Why?
Because restarting the nfsd does not reset any server state, so the sequence-id
situation would not be affected by doing this. (To get rid of server side state,
you must unload the nfsd.ko after killing off the nfsd daemon.)

All restarting the nfsd daemon will do is force the client to establish a new
TCP connection. That is at a layer below the NFS state.

>https://lists.freebsd.org/pipermail/freebsd-fs/2015-July/021707.html
>
>and the patch attached by Rick (  nfsv41exch.patch :
>http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20150729/586f776a/attachment.bin
>) .
>
>Since the issue started manifesting itself I have restarted the nfs
>daemon (grabbed a pcap and the corresponding error lines mentioning the
>sequences prior to doing that in case anyone is interested).
If you email me the pcap as an attachment, I can take a look at it in wireshark.

>The nfs server runs FreeBSD 11.1 :
I'm being lazy and not looking, but I am almost sure a 2015 patch will be in 11.1
and probably also in 10.2 and 11.0.

>freebsd-version -uk
>11.1-RELEASE-p1
>11.1-RELEASE-p1
>
>but I have seen it on 10.2 and 11.0 as well. The linux client is (/has
>been) running a version of Debian.
>
>The export lines in /etc/exports :
>
>V4: / -network=192.168.9.0 -mask=255.255.255.0
>
>/data/Sabnzb2015 -maproot=root: -alldirs -network=192.168.9.0
>-mask=255.255.255.0
>
>Uptime:
>
>8:27PM  up 196 days, 22:10, 1 users, load averages: 0.21, 0.17, 0.17
>
>Traffice since uptime (guessing NFS / non-NFS ratio of 3 to 1)
>
>          lagg0  in    400.901 KB/s        400.901 KB/s           10.425 TB
>                 out    32.781 KB/s         32.781 KB/s           14.132 TB
>
>
>I'm wondering: can the 2015 patch provided by Rick still be "safely"
>applied or has the nfs code changed too much since then? I've witnessed
>this issue a couple of times now and would very much like to test the
>patch provided.
As above, I'd be surprised if the patch isn't already in your 11.1 kernel,
but you can take a look.
If it isn't, let me know because that means it slipped through the cracks
and I need to get it committed, etc.

rick