NFSv4.1 hanging with NFSERR_BADSESSION
Colin Percival
cperciva at tarsnap.com
Sun Nov 13 20:22:41 UTC 2016
Hi all,
I'm trying to get FreeBSD to talk nicely to Amazon Elastic File System,
which is some flavour of NFSv4.1. (This is what prompted the earlier
discussion about umount and UDP RPCs.)
The latest problem I've run into is that partway through a buildworld I'll
get 32 I/O errors along with 'nfsv4 recover err returned 10052' from the
kernel, and thereafter anything touching that NFS mount will get stuck in
nfsclseq. (And kernels and filesystems being the wonderful things they are,
the processes which get stuck this way are completely uninterruptible.)
My guess is that this is a bug along the lines of "there's 32 slots for NFS
protocol requests and when an error occurs they're not getting marked as
available, resulting in everything piling up waiting for a free slot which
will never arrive"... but I don't know NFS or our client code nearly well
enough to see where this might be.
Any ideas?
--
Colin Percival
Security Officer Emeritus, FreeBSD | The power to serve
Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid
More information about the freebsd-fs
mailing list