Linux NFSv4 clients are getting (bad sequence-id error!)

Rick Macklem rmacklem at uoguelph.ca
Thu Jul 23 21:59:19 UTC 2015


Ahmed Kamal wrote:
> Can you please let me know the ultimate packet trace command I'd need to
> run in case of any nfs4 troubles .. I guess this should be comprehensive
> even at the expense of a larger output size (which we can trim later)..
> Thanks a lot for the help!
> 
tcpdump -s 0 -w <file>.pcap host <client-host-name>
(<file> refers to a file name you choose and <client-host-name> refers to
 the host name of a client generating traffic.)
--> But you won't be able to allow this to run for long during the storm or the
    file will be huge.

Then you look at <file>.pcap in wireshark, which knows NFS.

rick

> On Thu, Jul 23, 2015 at 11:53 PM, Rick Macklem <rmacklem at uoguelph.ca> wrote:
> 
> > Graham Allan wrote:
> > > For our part, the user whose code triggered the pathological behaviour
> > > on SL5 reran it on SL6 without incident - I still see lots of
> > > sequence-id errors in the logs, but nothing bad happened.
> > >
> > > I'd still like to ask them to rerun again on SL5 to see if the "accept
> > > skipped seqid" patch had any effect, though I think we expect not. Maybe
> > > it would be nice if I could get set up to capture rolling tcpdumps of
> > > the nfs traffic before they run that though...
> > >
> > > Graham
> > >
> > > On 7/20/2015 10:26 PM, Ahmed Kamal wrote:
> > > > Hi folks,
> > > >
> > > > I've upgraded a test client to rhel6 today, and I'll keep an eye on it
> > > > to see what happens.
> > > >
> > > > During the process, I made the (I guess mistake) of zfs send | recv to
> > a
> > > > locally attached usb disk for backup purposes .. long story short,
> > > > sharenfs property on the received filesystem was causing some
> > nfs/mountd
> > > > errors in logs .. I wasn't too happy with what I got .. I destroyed the
> > > > backup datasets and the whole pool eventually .. and then rebooted the
> > > > whole nas box .. After reboot my logs are still flooded with
> > > >
> > > > Jul 21 05:12:36 nas kernel: nfsrv_cache_session: no session
> > > > Jul 21 05:13:07 nas last message repeated 7536 times
> > > > Jul 21 05:15:08 nas last message repeated 29664 times
> > > >
> > > > Not sure what that means .. or how it can be stopped .. Anyway, will
> > > > keep you posted on progress.
> > >
> > Oh, I didn't see the part about "reboot" before. Unfortunately, it sounds
> > like the
> > client isn't recovering after the session is lost. When the server
> > reboots, the
> > client(s) will get NFS4ERR_BAD_SESSION errors back because the server
> > reboot has
> > deleted all sessions. The NFS4ERR_BAD_SESSION should trigger state
> > recovery on the client.
> > (It doesn't sound like the clients went into recovery, starting with a
> > Create_session
> >  operation, but without a packet trace, I can't be sure?)
> >
> > rick
> >
> > >
> > > --
> > > -------------------------------------------------------------------------
> > > Graham Allan - gta at umn.edu - allan at physics.umn.edu
> > > School of Physics and Astronomy - University of Minnesota
> > > -------------------------------------------------------------------------
> > >
> > >
> >
> 


More information about the freebsd-fs mailing list