Panic in NFS client on CURRENT
rmacklem at uoguelph.ca
Sat Feb 20 22:58:27 UTC 2016
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
> On 20.02.2016 04:02, Rick Macklem wrote:
> >> Basically, I'm asking if there was a server reboot or nfsd thread
> >> restart or some kind of network partition that would separate
> >> some client(s) from the server. OR Panic occurred during normal
> >> operation.
> There was NO server reboot/restarts. MAYBE, this VM (where client
> runs) lost network connectivity for several seconds, but server itself
> was NOT stopped, restarted or rebooted.
Well, the stack trace you put in the PR showed a recovery from an expired
lease. This should only occur when the client is partitioned from the
server for more than a lease duration (120sec on FreeBSD). Even a 120+sec
network partitioning won't cause an expired recovery unless a conflicting
open/lock request is made for a FreeBSD server. (A Linux server will NFS4ERR_EXPIRED
as soon as the lease has exceeded without a renewal and Linux uses a lease of 60sec,
so it is easier to reproduce with a Linux NFSv4 server if you happen to have one.)
--> So I don't know why it would go into a lease expired recovery. (A network
partitioning of a few seconds shouldn't do it.)
I think the only way to know what caused this would be to have a packet
capture that started before the problem occurred. (Maybe your network setup is
somehow directing some RPC messages to the wrong place or they`re being blocked
by some firewall setup.) If you have an NFSv4.0 mount you should see a Renew RPC
about once per minute (half a lease duration) which keeps the lease from expiring.
For NFSv4.1, it is an RPC with just a Sequence operation which should have the
Reproducing this shouldn't be easy (which is a good thing;-).
It has been a while, but it should take something like:
- network partition a client from the server while it has a file open, for several
minutes. (It might also need to have a byte range lock on the file, I can`t remember
for sure if just an open is sufficient.)
- Try and open the same file on another client (and get a conflicting byte range lock maybe).
--> This should result in a reply to the client of NFS4ERR_EXPIRED.
If you look at a packet trace in wireshark, it is a server reply of NFS4ERR_EXPIRED
that tells the client to go into this recovery cycle.
Unfortunately I am away from home until April, so I don't have access to wireshark
until then. (I will try and reproduce a NFSERR_EXPIRED failure with the laptops
I have with me, but I'm not sure if I can pull it off.)
Btw, this type of recovery isn't specified by the RFC and can only recover opens
and not byte range locks.
Fyi, the recovery from a server reboot (or reload of nfsd.ko in a FreeBSD server) is
specified by the RFC and can recover opens and locks. It starts with the server
replying NFSERR_STALECLIENTID or NFSERR_STALESTATEID.
Good luck with it, rick
ps: I`ll email if I reproduce the NFSERR_EXPIRED and find any problem beyond the
panic with fix already posted.
> - --
> // Lev Serebryakov
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2
> -----END PGP SIGNATURE-----
More information about the freebsd-fs