Linux NFSv4 clients are getting (bad sequence-id error!)

Wed Jul 8 14:19:19 UTC 2015

Hi folks,

I have tested Xin's patches .. Unfortunately the problem didn't go away :/
Many users are still reporting hung processes. If it would help, can you
show me how to dump a network trace that would help you identify the issue ?

Also, is it possible in any way to have my trusted nfs3, handle the case
where every zfs /home folder is its own dataset ?

On Mon, Jul 6, 2015 at 12:28 AM, Rick Macklem <rmacklem at uoguelph.ca> wrote:

> Ahmed Kamal wrote:
> > Hi folks,
> >
> > Just a quick update. I did not test Xin's patches yet .. What I did so
> far
> > is to increase the tcp highwater tunable and increase nfsd threads to 60.
> > Today (a working day) I noticed I only got one bad sequence error
> message!
> > Check this:
> >
> > # grep 'bad sequence' messages* | awk '{print $1 $2}' | uniq -c
> >       1 messages:Jul5
> >      39 messages.1:Jun28
> >      15 messages.1:Jun29
> >       4 messages.1:Jun30
> >       9 messages.1:Jul1
> >      23 messages.1:Jul2
> >       1 messages.1:Jul4
> >       1 messages.2:Jun28
> >
> > So there seems to be an improvement! Not sure if the Linux nfs4 client is
> > able to somehow recover from those bad-sequence situations or not .. I
> did
> > get some user complaints that running "ls -l" is sometimes slow and
> takes a
> > couple of seconds to finish.
> >
> > One final question .. Do you folks think nfs4.1 is more reliable in
> general
> > than nfs4 .. I've always only used nfs3 (I guess it can't work here with
> > /home/* being separate zfs filesystems) .. So should I go through the
> pain
> > of upgrading a few servers to RHEL-6 to try out nfs4.1 ? Basically do you
> > expect the protocol to be more solid ? I know it's a fluffy question,
> just
> > give me your thoughts. Thanks a lot!
> >
> All I can say is that the "bad seqid" errors should not occur, since
> NFSv4.1
> doesn't use the seqid#s to order RPCs.
>
> Also I would say that a correctly implemented NFSv4.1 protocol should
> function
> "more correctly" since all RPCs and performed "exactly once". (How much
> effect
> this will have in practice, I can't say.)
>
> On the other hand, NFSv4.1 is a newer protocol (with an RFC of over
> 500pages),
> so it is hard to say how mature the implementations are.
> I think only testing will give you the answer.
>
> I would suggest that you test Xi Lin's patch that allows the "seqid + 2"
> case
> and see if that makes the "bad seqid" errors go away. (Even though I think
> this
> would indicate a client bug, adding this in way that it can be enabled via
> a sysctl
> seems reasonable.)
>
> Btw, I haven't seen any additional posts from nfsv4 at ietf.org on this, rick
>
> >
> >
> > On Fri, Jul 3, 2015 at 2:51 AM, Rick Macklem <rmacklem at uoguelph.ca>
> wrote:
> >
> > > Ahmed Kamal wrote:
> > > > PS: Today (after adjusting tcp.highwater) I didn't get any screaming
> > > > reports from users about hung vnc sessions. So maybe just maybe,
> linux
> > > > clients are able to somehow recover from this bad sequence messages.
> I
> > > > could still see the bad sequence error message in logs though
> > > >
> > > > Why isn't the highwater tunable set to something better by default ?
> I
> > > mean
> > > > this server is certainly not under a high or unusual load (it's only
> 40
> > > PCs
> > > > mounting from it)
> > > >
> > > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal <
> > > email.ahmedkamal at googlemail.com
> > > > > wrote:
> > > >
> > > > > Thanks all .. I understand now we're doing the "right thing" ..
> > > Although
> > > > > if mounting keeps wedging, I will have to solve it somehow! Either
> > > using
> > > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1.
> > > > >
> > > > > Regarding Xin's patch, is it possible to build the patched nfsd
> code,
> > > as a
> > > > > kernel module ? I'm looking to minimize my delta to upstream.
> > > > >
> > > Yes, you can build the nfsd as a module. If your kernel config does not
> > > include
> > > "options NFSD" the module will get loaded/used. It is also possible to
> > > replace
> > > the module without rebooting, but you need to kill of the nfsd daemon
> then
> > > kldunload nfsd.ko and replace nfsd.ko with the new one. (In
> > > /boot/<kernel-name>.)
> > >
> > > > > Also would adopting Xin's patch and hiding it behind a
> > > > > kern.nfs.allow_linux_broken_client be an option (I'm probably not
> the
> > > last
> > > > > person on earth to hit this) ?
> > > > >
> > > If it fixes your problem, I think this is reasonable.
> > > I'm also hoping that someone that works on the Linux client reports
> > > if/when this
> > > was changed.
> > >
> > > rick
> > >
> > > > > Thanks a lot for all the help!
> > > > >
> > > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem <
> rmacklem at uoguelph.ca>
> > > > > wrote:
> > > > >
> > > > >> Ahmed Kamal wrote:
> > > > >> > Appreciating the fruitful discussion! Can someone please
> explain to
> > > me,
> > > > >> > what would happen in the current situation (linux client doing
> this
> > > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the effect
> of
> > > that?
> > > > >> Well, as you've seen, the Linux client doesn't function correctly
> > > against
> > > > >> the FreeBSD server (and probably others that don't support this
> > > > >> "skip-by-1"
> > > > >> case).
> > > > >>
> > > > >> > What do users see? Any chances of data loss?
> > > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess what
> the
> > > Linux
> > > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID. You're the
> guy
> > > > >> observing
> > > > >> it.
> > > > >>
> > > > >> >
> > > > >> > Also, I find it strange that netapp have acknowledged this is a
> bug
> > > on
> > > > >> > their side, which has been fixed since then!
> > > > >> Yea, I think Netapp screwed up. For some reason their server
> allowed
> > > this,
> > > > >> then was fixed to not allow it and then someone decided that was
> > > broken
> > > > >> and
> > > > >> reversed it.
> > > > >>
> > > > >> > I also find it strange that I'm the first to hit this :) Is no
> one
> > > > >> running
> > > > >> > nfs4 yet!
> > > > >> >
> > > > >> Well, it seems to be slowly catching on. I suspect that the Linux
> > > client
> > > > >> mounting a Netapp is the most common use of it. Since it appears
> that
> > > they
> > > > >> flip flopped w.r.t. who's bug this is, it has probably persisted.
> > > > >>
> > > > >> It may turn out that the Linux client has been fixed or it may
> turn
> > > out
> > > > >> that most servers allowed this "skip-by-1" even though David
> Noveck
> > > (one
> > > > >> of the main authors of the protocol) seems to agree with me that
> it
> > > should
> > > > >> not be allowed.
> > > > >>
> > > > >> It is possible that others have bumped into this, but it wasn't
> > > isolated
> > > > >> (I wouldn't have guessed it, so it was good you pointed to the
> RedHat
> > > > >> discussion)
> > > > >> and they worked around it by reverting to NFSv3 or similar.
> > > > >> The protocol is rather complex in this area and changed
> completely for
> > > > >> NFSv4.1,
> > > > >> so many have also probably moved onto NFSv4.1 where this won't be
> an
> > > > >> issue.
> > > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics and
> > > doesn't
> > > > >> use
> > > > >>  these seqid fields.)
> > > > >>
> > > > >> This is all just mho, rick
> > > > >>
> > > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem <
> rmacklem at uoguelph.ca>
> > > > >> wrote:
> > > > >> >
> > > > >> > > Julian Elischer wrote:
> > > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote:
> > > > >> > > > > I am going to post to nfsv4 at ietf.org to see what they
> say.
> > > Please
> > > > >> > > > > let me know if Xin Li's patch resolves your problem, even
> > > though I
> > > > >> > > > > don't believe it is correct except for the UINT32_MAX
> case.
> > > Good
> > > > >> > > > > luck with it, rick
> > > > >> > > > and please keep us all in the loop as to what they say!
> > > > >> > > >
> > > > >> > > > the general N+2 bit sounds like bullshit to me.. its always
> N+1
> > > in a
> > > > >> > > > number field that has a
> > > > >> > > > bit of slack at wrap time (probably due to some ambiguity
> in the
> > > > >> > > > original spec).
> > > > >> > > >
> > > > >> > > Actually, since N is the lock op already done, N + 1 is the
> next
> > > lock
> > > > >> > > operation in order. Since lock ops need to be strictly
> ordered,
> > > > >> allowing
> > > > >> > > N + 2 (which means N + 2 would be done before N + 1) makes no
> > > sense.
> > > > >> > >
> > > > >> > > I think the author of the RFC meant that N + 2 or greater
> fails,
> > > but
> > > > >> it
> > > > >> > > was poorly worded.
> > > > >> > >
> > > > >> > > I will pass along whatever I get from nfsv4 at ietf.org. (There
> is
> > > an
> > > > >> archive
> > > > >> > > of it somewhere, but I can't remember where.;-)
> > > > >> > >
> > > > >> > > rick
> > > > >> > > _______________________________________________
> > > > >> > > freebsd-fs at freebsd.org mailing list
> > > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > > > >> > > To unsubscribe, send any mail to "
> > > freebsd-fs-unsubscribe at freebsd.org"
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>