Linux NFSv4 clients are getting (bad sequence-id error!)

Rick Macklem rmacklem at uoguelph.ca
Thu Jul 2 00:25:07 UTC 2015


Xin Li wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
> 
> On 07/01/15 16:44, Ahmed Kamal via freebsd-fs wrote:
> > The not so great news is, after updating sysctl and rebooting the
> > nas box, I still saw a few (NFS: v4 server nas  returned a bad
> > sequence-id error!) lines in logs. Users have already left, so I
> > don't know how bad is it ..
> > 
> > Could you share more info on what this error means? RedHat seems to
> > think the client can skip-by-1 and choose larger IDs and that would
> > be totally fine ? Also how serious is this error, would it cause
> > NFS session stall like that ?
> 
Ok, I see the "skip-by-1" comment in the RedHat stuff. I have no idea
where they got that from, because I certainly don't read it that way in
the RFC.
I'll post on nfsv4 at ietf.org to see what the story is on this.

(Very recently, a new RFC was published that replaces RFC-3530. Maybe it
 allows this skip-by-1. It shouldn't, because the whole idea is to order
 lock operations. It may be that the OPEN case has been changed. The Linux
 client author is always trying to come up with tricks that allow OPENs to
 be done concurrently.)

This patch does look correct if this "skip-by-1" is supposed to be allowed
for all operations. It will be interesting to see if this patch resolves the
problem.

Thanks for spotting this. I would never have guessed that this "skip-by-1"
would be done.

rick

> I wonder if this would help, which loosen the check:
> 
> Index: sys/fs/nfsserver/nfs_nfsdstate.c
> ===================================================================
> - --- sys/fs/nfsserver/nfs_nfsdstate.c	(revision 285016)
> +++ sys/fs/nfsserver/nfs_nfsdstate.c	(working copy)
> @@ -3805,7 +3805,8 @@ nfsrv_checkseqid(struct nfsrv_descript *nd, u_int3
>  		printf("refcnt=%d\n", stp->ls_op->rc_refcnt);
>  		panic("nfsrvstate op refcnt");
>  	}
> - -	if ((stp->ls_seq + 1) == seqid) {
> +	if ((stp->ls_seq + 1) == seqid ||
> +	    (stp->ls_seq + 2) == seqid) {
>  		if (stp->ls_op)
>  			nfsrvd_derefcache(stp->ls_op);
>  		stp->ls_op = op;
> 
> 
> Personally I don't quite buy the skip-by-1 is Okay argument but it
> seems that the RFC text can be interpreted that way.
> 
> Cheers,
> 
> 
> > On Thu, Jul 2, 2015 at 1:36 AM, Rick Macklem <rmacklem at uoguelph.ca>
> > wrote:
> > 
> >> Ahmed Kamal wrote:
> >>> Hi all,
> >>> 
> >>> I'm a refugee from linux land. I just set up my first freebsd
> >>> 10.1 zfs
> >> box,
> >>> sharing /home over nfs. Since every home directory is its own
> >>> zfs
> >> dataset,
> >>> I chose to use nfsv4 to enable recursively sharing/mounting any
> >>> directory under /home (I understand nfs4 is a must in this
> >>> scenario!)
> >>> 
> >>> I'm able to mount form linux (rhel5 latest kernel)
> >>> successfully. Users
> >> are
> >>> working fine. However every now and then a user screams that
> >>> his session
> >> is
> >>> frozen. Usually the processes are stuck in nfs_wait or rpc_*
> >>> state. I
> >> tried
> >>> using a much newer linux kernel (3.2 however it still faced the
> >>> same problem). The errors in Linux log files are mostly: Jul  1
> >>> 17:41:47 mammoth kernel: NFS: v4 server nas  returned a *bad
> >>> sequence-id error*! Jul  1 17:52:32 mammoth kernel:
> >>> nfs4_reclaim_locks: unhandled error -11. Zeroing state Jul  1
> >>> 17:52:32 mammoth kernel: nfs4_reclaim_open_state: Lock reclaim
> >>> failed!
> >>> 
> >> Btw, a client should only do "reclaim" operations after the
> >> server has replied with NFS4ERR_STALE_CLIENTID or
> >> NFS4ERR_STALE_STATEID. I am pretty certain that the FreeBSD NFSv4
> >> server only generates these replies after it has rebooted, so
> >> assuming the server didn't reboot, I have no idea why the client
> >> would attempt these and am not surprised they failed.
> >> 
> >> I'm guessing that the DRC constipation somehow caused the Linux
> >> client to go into recovery mode?
> >> 
> >> rick
> >> 
> >>> My search led me to
> >>> (https://access.redhat.com/solutions/1328073) a detailed
> >>> analysis of the issue, which you can read over here
> >>> https://dl.dropboxusercontent.com/u/51939288/nfs4-bad-seq.pdf
> >>> .. NetApp confirmed this was a bug for them (I'm wondering if
> >>> this is still in FreeBSD?!)
> >>> 
> >>> PS: Right before sending this, I saw dmesg on the freebsd box
> >>> advising increasing vfs.nfsd.tcphighwater .. So I up'ed that to
> >>> 64000. I also
> >> up'ed
> >>> the number of nfs server threads (-t) from 10 to 60 (we're
> >>> roughly 40
> >> linux
> >>> machines)
> >>> 
> >>> Any advice is most appreciated!
> >>> 
> >>> Thanks _______________________________________________
> >>> freebsd-fs at freebsd.org mailing list
> >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs To
> >>> unsubscribe, send any mail to
> >>> "freebsd-fs-unsubscribe at freebsd.org"
> >>> 
> >> 
> > _______________________________________________
> > freebsd-fs at freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs To
> > unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
> > 
> 
> 
> - --
> Xin LI <delphij at delphij.net>    https://www.delphij.net/
> FreeBSD - The Power to Serve!           Live free or die
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.1.5 (FreeBSD)
> 
> iQIcBAEBCgAGBQJVlHxuAAoJEJW2GBstM+nskvsP/ire8QyTfL6mF1njMNZwI/k5
> AQ+BwWs5r8LzcRN/4v7/gelbS+lXnYVbVHMl8q6j+HzUzQ3yId4ZGlJWpJtHDNnj
> +gV8kmFt/og1QTrQRbN81i4GEr914SlKWmo7LsxrWmEhAiKsN0sYsjELD/mH5BZX
> 1wRe3vTvyrMwm+6u1krqT8ZrxRANBFBmNqiFb8sag7B3oJQZsGhAyUSsJvUhb00o
> ozwC2NT5y8Jv0QcZdC/wGeYc8FmRNQTAjE22WkzbsUey/e7FxL7vflCGgngYCIxE
> zbZNW65xThZO8fti5MxiepJ27VPa5ocX0CQihBFYp5haG6fzWBGalV/ggAOwYL44
> nz1caLhdKIj9JSd8QwLdTArq8+6H8Sx4jp4iGzQnppNo8PtG/AlHlw9uDKaUF4iw
> H+tMb6qMu2FQJ9X+phtplzvjZxCbBbwY205GeTm5eElOkYzIyYvqIvZasos02ze0
> v3SQXtpIHjrnndXMVNRJOkhYquGxVFxUm5IJ7o+0wrgVJp1V3cBKd4vs0o84Mgu5
> EPGKCyt8x/B6ujCxkunODpNOb+sFyq6aqsDLAO6JSih5HfQntpxoZTjm8p4KjsG6
> nPqXQXmi2NoOd6WPOunp7w/y+fKA4YdLAhPC7rbXQwpLL81UqNH141BrtscN0ovi
> pyRlJ4r3Zs75qUwVSkzL
> =3/OG
> -----END PGP SIGNATURE-----
> 


More information about the freebsd-fs mailing list