Linux NFSv4 clients are getting (bad sequence-id error!)

Thu Jul 23 15:27:12 UTC 2015

Well .. The problem is now gone, so I guess I can't collect more data till
it happens (or hopefully doesn't :) happen again .. So as I described, I
had to restart the FreeBSD NFS server box first .. maybe this caused linux
clients to give up after 5 mins, and attempt to destroy the session ? When
the NFS server was back up .. It was being bombarded (50Mbps traffic) with
rpc traffic, probably saying this "destroy session" message.

What I don't understand however is, why doesn't this end. What does FreeBSD
reply with? Shouldn't it say, Okay, I don't know anything about this
session, so consider it destroyed .. suit yourself linux .. or does it
refuse to destroy, causing Linux to keep on retrying like crazy ?

On Thu, Jul 23, 2015 at 12:40 PM, Rick Macklem <rmacklem at uoguelph.ca> wrote:

> Ahmed Kamal wrote:
> > rhel6 servers logs were flooded with errors like:
> http://paste2.org/EwLGcGF6
> > The Freebsd box was being pounded with 40Mbps of nfs traffic .. probably
> > Linux was retrying too hard ?! I had to reboot all PCs and after the last
> > one, nfsd CPU usage dropped immediately to zero
> >
> Btw, it would be interesting to know what triggers these things (overload
> of
> the nfs server resulting in very slow response or ???). Basically
> Destroy_session
> isn't an operation that a client would normally do. I have no idea why the
> Linux
> client would do it. (A session is what achieves the "exactly once"
> semantics for
> the RPCs. It should really be in the RPC layer, but the NFSv4 working
> group put
> it in NFSv4.1 because they didn't want to replace Sun RPC. I can't think
> of a reason
> to destroy a session except on dismount. Maybe if the client thinks the
> session is
> broken for some reason??)
>
> Maybe something like "vmstat -m", "vmstat -z" and "nfsstat -s -e" running
> repeatedly
> (once/sec with timestamps via "date" or similar) so that you can see what
> was happening just
> before the meltdowns.
>
> A raw packet trace of just when the meltdown starts would be useful, but I
> can't think
> of how you'd get one of reasonable size. Maybe having "tcpdump -s 0 -w
> <file>.pcap <client-host>"
> run for 1sec and then kill/restart it repeatedly with different file
> names, so you might get
> a useful 1sec capture at the critical time?
>
> Anyhow, good luck with it, rick
>
> > On Tue, Jul 21, 2015 at 5:52 AM, Ahmed Kamal <
> > email.ahmedkamal at googlemail.com> wrote:
> >
> > > More info .. Just noticed nfsd is spinning the cpu at 500% :( I just
> did
> > > the dtrace with:
> > >
> > > dtrace -n profile-1001 { @[stack()] = count(); }
> > > The result is at http://paste2.org/vb8ZdvF2 (scroll to bottom)
> > >
> > > Since rebooting the nfs server didn't fix it .. I imagine I'd have to
> > > reboot all NFS clients .. This would be really sad .. Any advice is
> most
> > > appreciated .. Thanks
> > >
> > >
> > > On Tue, Jul 21, 2015 at 5:26 AM, Ahmed Kamal <
> > > email.ahmedkamal at googlemail.com> wrote:
> > >
> > >> Hi folks,
> > >>
> > >> I've upgraded a test client to rhel6 today, and I'll keep an eye on
> it to
> > >> see what happens.
> > >>
> > >> During the process, I made the (I guess mistake) of zfs send | recv
> to a
> > >> locally attached usb disk for backup purposes .. long story short,
> > >> sharenfs
> > >> property on the received filesystem was causing some nfs/mountd
> errors in
> > >> logs .. I wasn't too happy with what I got .. I destroyed the backup
> > >> datasets and the whole pool eventually .. and then rebooted the whole
> nas
> > >> box .. After reboot my logs are still flooded with
> > >>
> > >> Jul 21 05:12:36 nas kernel: nfsrv_cache_session: no session
> > >> Jul 21 05:13:07 nas last message repeated 7536 times
> > >> Jul 21 05:15:08 nas last message repeated 29664 times
> > >>
> > >> Not sure what that means .. or how it can be stopped .. Anyway, will
> keep
> > >> you posted on progress.
> > >>
> > >> On Fri, Jul 17, 2015 at 9:31 PM, Rick Macklem <rmacklem at uoguelph.ca>
> > >> wrote:
> > >>
> > >>> Graham Allan wrote:
> > >>> > I'm curious how things are going for you with this?
> > >>> >
> > >>> > Reading your thread did pique my interest since we have a lot of
> > >>> > Scientific Linux (RHEL clone) boxes with FreeBSD NFSv4 servers. I
> meant
> > >>> > to glance through our logs for signs of the same issue, but today I
> > >>> > started investigating a machine which appeared to have hung
> processes,
> > >>> > high rpciod load, and high traffic to the NFS server. Of course it
> is
> > >>> > exactly this issue.
> > >>> >
> > >>> > The affected machine is running SL5 though most of our server
> nodes are
> > >>> > now SL6. I can see errors from most of them but the SL6 systems
> appear
> > >>> > less affected - I see a stream of the sequence-id errors in their
> logs
> > >>> but
> > >>> > things in general keep working. The one SL5 machine I'm looking at
> > >>> > has a single sequence-id error in today's logs, but then goes into
> a
> > >>> > stream of "state recovery failed" then "Lock reclaim failed". It's
> > >>> > probably partly related to the particular workload on this machine.
> > >>> >
> > >>> > I would try switching our SL6 machines to NFS 4.1 to see if the
> > >>> > behaviour changes, but 4.1 isn't supported by our 9.3 servers (is
> it in
> > >>> > 10.1?).
> > >>> >
> > >>> Btw, I've done some testing against a fairly recent Fedora and
> haven't
> > >>> seen
> > >>> the problem. If either of you guys could load a recent Fedora on a
> test
> > >>> client
> > >>> box, it would be interesting to see if it suffers from this. (My
> > >>> experience is
> > >>> that the Fedora distros have more up to date Linux NFS clients.)
> > >>>
> > >>> rick
> > >>>
> > >>> > At the NFS servers, most of the sysctl settings are already tuned
> > >>> > from defaults. eg tcp.highwater=100000, vfs.nfsd.tcpcachetimeo=300,
> > >>> > 128-256 nfs kernel threads.
> > >>> >
> > >>> > Graham
> > >>> >
> > >>> > On Fri, Jul 03, 2015 at 01:21:00AM +0200, Ahmed Kamal via
> freebsd-fs
> > >>> wrote:
> > >>> > > PS: Today (after adjusting tcp.highwater) I didn't get any
> screaming
> > >>> > > reports from users about hung vnc sessions. So maybe just maybe,
> > >>> linux
> > >>> > > clients are able to somehow recover from this bad sequence
> messages.
> > >>> I
> > >>> > > could still see the bad sequence error message in logs though
> > >>> > >
> > >>> > > Why isn't the highwater tunable set to something better by
> default ?
> > >>> I mean
> > >>> > > this server is certainly not under a high or unusual load (it's
> only
> > >>> 40 PCs
> > >>> > > mounting from it)
> > >>> > >
> > >>> > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal
> > >>> > > <email.ahmedkamal at googlemail.com
> > >>> > > > wrote:
> > >>> > >
> > >>> > > > Thanks all .. I understand now we're doing the "right thing" ..
> > >>> Although
> > >>> > > > if mounting keeps wedging, I will have to solve it somehow!
> Either
> > >>> using
> > >>> > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1.
> > >>> > > >
> > >>> > > > Regarding Xin's patch, is it possible to build the patched nfsd
> > >>> code, as
> > >>> > > > a
> > >>> > > > kernel module ? I'm looking to minimize my delta to upstream.
> > >>> > > >
> > >>> > > > Also would adopting Xin's patch and hiding it behind a
> > >>> > > > kern.nfs.allow_linux_broken_client be an option (I'm probably
> not
> > >>> the
> > >>> > > > last
> > >>> > > > person on earth to hit this) ?
> > >>> > > >
> > >>> > > > Thanks a lot for all the help!
> > >>> > > >
> > >>> > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem <
> > >>> rmacklem at uoguelph.ca>
> > >>>
> > >>> > > > wrote:
> > >>> > > >
> > >>> > > >> Ahmed Kamal wrote:
> > >>> > > >> > Appreciating the fruitful discussion! Can someone please
> > >>> explain to
> > >>> > > >> > me,
> > >>> > > >> > what would happen in the current situation (linux client
> doing
> > >>> this
> > >>> > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the
> effect
> > >>> of
> > >>> > > >> > that?
> > >>> > > >> Well, as you've seen, the Linux client doesn't function
> correctly
> > >>> > > >> against
> > >>> > > >> the FreeBSD server (and probably others that don't support
> this
> > >>> > > >> "skip-by-1"
> > >>> > > >> case).
> > >>> > > >>
> > >>> > > >> > What do users see? Any chances of data loss?
> > >>> > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess
> what
> > >>> the
> > >>> > > >> Linux
> > >>> > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID. You're
> the
> > >>> guy
> > >>> > > >> observing
> > >>> > > >> it.
> > >>> > > >>
> > >>> > > >> >
> > >>> > > >> > Also, I find it strange that netapp have acknowledged this
> is a
> > >>> bug on
> > >>> > > >> > their side, which has been fixed since then!
> > >>> > > >> Yea, I think Netapp screwed up. For some reason their server
> > >>> allowed
> > >>> > > >> this,
> > >>> > > >> then was fixed to not allow it and then someone decided that
> was
> > >>> broken
> > >>> > > >> and
> > >>> > > >> reversed it.
> > >>> > > >>
> > >>> > > >> > I also find it strange that I'm the first to hit this :) Is
> no
> > >>> one
> > >>> > > >> running
> > >>> > > >> > nfs4 yet!
> > >>> > > >> >
> > >>> > > >> Well, it seems to be slowly catching on. I suspect that the
> Linux
> > >>> client
> > >>> > > >> mounting a Netapp is the most common use of it. Since it
> appears
> > >>> that
> > >>> > > >> they
> > >>> > > >> flip flopped w.r.t. who's bug this is, it has probably
> persisted.
> > >>> > > >>
> > >>> > > >> It may turn out that the Linux client has been fixed or it may
> > >>> turn out
> > >>> > > >> that most servers allowed this "skip-by-1" even though David
> > >>> Noveck (one
> > >>> > > >> of the main authors of the protocol) seems to agree with me
> that
> > >>> it
> > >>> > > >> should
> > >>> > > >> not be allowed.
> > >>> > > >>
> > >>> > > >> It is possible that others have bumped into this, but it
> wasn't
> > >>> isolated
> > >>> > > >> (I wouldn't have guessed it, so it was good you pointed to the
> > >>> RedHat
> > >>> > > >> discussion)
> > >>> > > >> and they worked around it by reverting to NFSv3 or similar.
> > >>> > > >> The protocol is rather complex in this area and changed
> > >>> completely for
> > >>> > > >> NFSv4.1,
> > >>> > > >> so many have also probably moved onto NFSv4.1 where this
> won't be
> > >>> an
> > >>> > > >> issue.
> > >>> > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics
> and
> > >>> doesn't
> > >>> > > >> use
> > >>> > > >>  these seqid fields.)
> > >>> > > >>
> > >>> > > >> This is all just mho, rick
> > >>> > > >>
> > >>> > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem <
> > >>> rmacklem at uoguelph.ca>
> > >>> > > >> wrote:
> > >>> > > >> >
> > >>> > > >> > > Julian Elischer wrote:
> > >>> > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote:
> > >>> > > >> > > > > I am going to post to nfsv4 at ietf.org to see what they
> > >>> say.
> > >>> > > >> > > > > Please
> > >>> > > >> > > > > let me know if Xin Li's patch resolves your problem,
> even
> > >>> though
> > >>> > > >> > > > > I
> > >>> > > >> > > > > don't believe it is correct except for the UINT32_MAX
> > >>> case. Good
> > >>> > > >> > > > > luck with it, rick
> > >>> > > >> > > > and please keep us all in the loop as to what they say!
> > >>> > > >> > > >
> > >>> > > >> > > > the general N+2 bit sounds like bullshit to me.. its
> always
> > >>> N+1 in
> > >>> > > >> > > > a
> > >>> > > >> > > > number field that has a
> > >>> > > >> > > > bit of slack at wrap time (probably due to some
> ambiguity
> > >>> in the
> > >>> > > >> > > > original spec).
> > >>> > > >> > > >
> > >>> > > >> > > Actually, since N is the lock op already done, N + 1 is
> the
> > >>> next
> > >>> > > >> > > lock
> > >>> > > >> > > operation in order. Since lock ops need to be strictly
> > >>> ordered,
> > >>> > > >> allowing
> > >>> > > >> > > N + 2 (which means N + 2 would be done before N + 1)
> makes no
> > >>> sense.
> > >>> > > >> > >
> > >>> > > >> > > I think the author of the RFC meant that N + 2 or greater
> > >>> fails, but
> > >>> > > >> it
> > >>> > > >> > > was poorly worded.
> > >>> > > >> > >
> > >>> > > >> > > I will pass along whatever I get from nfsv4 at ietf.org.
> (There
> > >>> is an
> > >>> > > >> archive
> > >>> > > >> > > of it somewhere, but I can't remember where.;-)
> > >>> > > >> > >
> > >>> > > >> > > rick
> > >>> > > >> > > _______________________________________________
> > >>> > > >> > > freebsd-fs at freebsd.org mailing list
> > >>> > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > >>> > > >> > > To unsubscribe, send any mail to
> > >>> > > >> > > "freebsd-fs-unsubscribe at freebsd.org"
> > >>> > > >> > >
> > >>> > > >> >
> > >>> > > >>
> > >>> > > >
> > >>> > > >
> > >>> > > _______________________________________________
> > >>> > > freebsd-fs at freebsd.org mailing list
> > >>> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > >>> > > To unsubscribe, send any mail to "
> freebsd-fs-unsubscribe at freebsd.org
> > >>> "
> > >>> >
> > >>> > --
> > >>> >
> > >>>
> -------------------------------------------------------------------------
> > >>> > Graham Allan - allan at physics.umn.edu - gta at umn.edu - (612)
> 624-5040
> > >>> > School of Physics and Astronomy - University of Minnesota
> > >>> >
> > >>>
> -------------------------------------------------------------------------
> > >>> > _______________________________________________
> > >>> > freebsd-fs at freebsd.org mailing list
> > >>> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > >>> > To unsubscribe, send any mail to "
> freebsd-fs-unsubscribe at freebsd.org"
> > >>> >
> > >>> _______________________________________________
> > >>> freebsd-fs at freebsd.org mailing list
> > >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org
> "
> > >>>
> > >>
> > >>
> > >
> >
>