Panic in NFS client on CURRENT

Sat Feb 20 22:58:27 UTC 2016

Lev wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
> 
> On 20.02.2016 04:02, Rick Macklem wrote:
> 
> >> Basically, I'm asking if there was a server reboot or nfsd thread
> >> restart or some kind of network partition that would separate
> >> some client(s) from the server. OR Panic occurred during normal
> >> operation.
>  There was NO server reboot/restarts. MAYBE, this VM (where client
> runs) lost network connectivity for several seconds, but server itself
> was NOT stopped, restarted or rebooted.
> 
Well, the stack trace you put in the PR showed a recovery from an expired
lease. This should only occur when the client is partitioned from the
server for more than a lease duration (120sec on FreeBSD). Even a 120+sec
network partitioning won't cause an expired recovery unless a conflicting
open/lock request is made for a FreeBSD server. (A Linux server will NFS4ERR_EXPIRED
as soon as the lease has exceeded without a renewal and Linux uses a lease of 60sec,
so it is easier to reproduce with a Linux NFSv4 server if you happen to have one.)

--> So I don't know why it would go into a lease expired recovery. (A network
    partitioning of a few seconds shouldn't do it.)
I think the only way to know what caused this would be to have a packet
capture that started before the problem occurred. (Maybe your network setup is
somehow directing some RPC messages to the wrong place or they`re being blocked
by some firewall setup.) If you have an NFSv4.0 mount you should see a Renew RPC
about once per minute (half a lease duration) which keeps the lease from expiring.
For NFSv4.1, it is an RPC with just a Sequence operation which should have the
same effect.

Reproducing this shouldn't be easy (which is a good thing;-).
It has been a while, but it should take something like:
- network partition a client from the server while it has a file open, for several
  minutes. (It might also need to have a byte range lock on the file, I can`t remember
  for sure if just an open is sufficient.)
- Try and open the same file on another client (and get a conflicting byte range lock maybe).
--> This should result in a reply to the client of NFS4ERR_EXPIRED.
If you look at a packet trace in wireshark, it is a server reply of NFS4ERR_EXPIRED
that tells the client to go into this recovery cycle.
Unfortunately I am away from home until April, so I don't have access to wireshark
until then. (I will try and reproduce a NFSERR_EXPIRED failure with the laptops
I have with me, but I'm not sure if I can pull it off.)

Btw, this type of recovery isn't specified by the RFC and can only recover opens
and not byte range locks.

Fyi, the recovery from a server reboot (or reload of nfsd.ko in a FreeBSD server) is
specified by the RFC and can recover opens and locks. It starts with the server
replying NFSERR_STALECLIENTID or NFSERR_STALESTATEID.

Good luck with it, rick
ps: I`ll email if I reproduce the NFSERR_EXPIRED and find any problem beyond the
    panic with fix already posted.

> - --
> // Lev Serebryakov
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2
> 
> iQJ7BAEBCgBmBQJWyEkiXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w
> ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRGOTZEMUNBMEI1RjQzMThCNjc0QjMzMEFF
> QUIwM0M1OEJGREM0NzhGAAoJEOqwPFi/3EePfCsP9AuK494J8cUft0MmvAly7yVw
> iF0R0joxwttp9t6qydMjlQfmj6yoX+UACFWWRBZGgGrS8K7PcGSsGFl5s/Bt1ylL
> lw3GDr7GsVNDOhG4ypwsiqI2Wq/PzFhBMUpuUq6A+kdqZVH1ApQFyDKrdWbvDQLx
> 9Dm6vvW/fx6W1PgJp4i2B8zSf4vz7s91JyPMXnN9IQNG/1H9WERudzx/2kp1ws9y
> wYCXVmsidMO9j0DQ4eVVSM2vSfc6VKgyjWhVeHguRXc5F3L5VGuoSXyzCkceC66r
> t+8MDYhrsm00hrkZyTO6s1KcC8OKrgZBr9p0UIM1oMaqo02DyWp7KfM1nDMW9FI6
> IXsLaizPnnf7u+gGI2SllNXMaPvcREAxrnQDHKTdifKkpXrSroYYfJGmxAsRidmY
> 8nwZ1bytGeSHlTYSq1XTJLCWsSoM/o0Vgl+bGXvajWFkFT/GRGb5akWUBZhkzo7n
> TTpm0zrLuSvqWwRvqisoAuKW7QmCF2E0ei0E01TA3DDpF31dLOCApMq4t/UooT5h
> w25dTRpc+WPUEwKXSzZ90kPHmmoRz7dn8y6Oeb681GtqoauMBgVUuWhI7+sobRBy
> gcyIIpPB1Y0vteslzd5JDRUWcDUGg23fqRgax+J+motaNEXus2P6RxZTkq3DmOgO
> qpvv/BwLVn++rVnTNWY=
> =hihT
> -----END PGP SIGNATURE-----
>