Re: nfs stalls client: nfsrv_cache_session: no session

From: Peter <pmc_at_citylink.dinoex.sub.org>
Date: Sat, 16 Jul 2022 14:57:01 UTC
On Sat, Jul 16, 2022 at 01:43:11PM +0000, Rick Macklem wrote:
! Peter <pmc@citylink.dinoex.sub.org> wrote:
! > Hija,
! >  I have a problem with NFSv4:
! >
! > The configuration:
! >   Server Rel. 13.1-RC2
! >     nfs_server_enable="YES"
! >     nfs_server_flags="-u -t --minthreads 2 --maxthreads 20 -h ..."
! Allowing it to go down to 2 threads is very low. I've never even
! tried to run a server with less than 4 threads. Since kernel threads
! don't generate much overhead, I'd suggest replacing the
! minthreads/maxthreads with "-n 32" for a very small server.

Okay.
This normally used for building ports, quarterly or so, and writes
go to a local filesystem. Only when something doesn't build and I
start manual tests, then the default /usr/ports nfs share might get
the writes.
With Rel. 13 I think I should move the whole thing to virt-9p
filesystems, on occasion.
 
! >     mountd_enable="YES"
! >     mountd_flags="-S -p 803 -h ..."
! >     rpc_lockd_enable="YES"
! >     rpc_lockd_flags="-h ..."
! >     rpc_statd_enable="YES"
! >     rpc_statd_flags="-h ..."
! >     rpcbind_enable="YES"
! >     rpcbind_flags="-h ..."
! >     nfsv4_server_enable="YES"
! >     sysctl vfs.nfs.enable_uidtostring=1
! >     sysctl vfs.nfsd.enable_stringtouid=1
! > 
! >   Client bhyve Rel. 13.1-RELEASE on the same system
! >     nfs_client_enable="YES"
! >     nfs_access_cache="600"
! >     nfs_bufpackets="32"
! >     nfscbd_enable="YES"
! > 
! >   Mount-options: nfsv4,readahead=1,rw,async
! I would expect the behaviour you are seeing for "intr" and/or "soft"
! mounts, but since you are not using those, I don't know how you
! broke the session? (10052 is NFSERR_BADSESSION)
! You might want to do "nfsstat -m" on the client to see what options
! were actually negotiated for the mount and then check that neither
! "soft" nor "intr" are there.

I killed that client after I found no way out. Normally it looks like this:

nfsv4,minorversion=2,tcp,resvport,nconnect=1,hard,cto,sec=sys,acdirmin=3,
acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,
wsize=65536,readdirsize=65536,readahead=1,wcommitsize=16777216,timeout=120,
retrans=2147483647

! I suspect that the recovery thread in the client (called "nfscl") is
! somehow wedged and cannot do the recovery from the bad session,

These were present, two of them - I remember seeing "D" flag, but this
seem to always be the case.

! If increasing the number of nfsd threads in the server doesn't resolve
! the problem, I'd guess it is some network weirdness caused by how
! the bhyve instance is networked to its host. (I always use bridging
! for bhyve instances and do NFS mounts, but I don't work those
! mounts hard.)

They attach to a netgraph bridge:
https://gitr.daemon.contact/sysup/tree/subr_virt.sh#n84

! Btw, "umount -N <mnt_path>" on the client will normally get rid
! of a hung mount, although it can take a couple of minutes to complete.

Ups, I missed that! I only remembered -f, which didn't work.

Thanks!
PMc