Re: FreeBSD 13.2 NFS client mount hangs

From: J David <j.david.lists_at_gmail.com>
Date: Fri, 06 Oct 2023 17:47:51 UTC
On Mon, Oct 2, 2023 at 7:08 PM Rick Macklem <rick.macklem@gmail.com> wrote:
> > The nfscbd daemon is not running on any of the clients.
> >
> > > If the Linux server still
> > > issues delegations
> >
> > How would I determine that?
> nfsstat -E -c
> and then look at the number under "Delegs". It is a current count of
> delegations, so if it remains 0 over time, no delegations are being issued.

But if this is done from a client that is not running nfscbd, isn't it
pretty well guaranteed to be zero?

Checking all the clients I can find, "Deleg" is zero on all of them.
On about half, "DelegRet" is nonzero but small (1-100), but I don't
know what that is or if it's related.

> I have attached a small patch which should make the NFS client handle
> this error correctly.

I will look for a way to try this patch, but the clients in this case
are all managed with freebsd-update and don't have enough disk space
to build a kernel locally, so it may be tricky.

> > > # tcpdump -s 0 -w out.pcap host <nfs-server-name>
> > > Let this run for a while and then pull out.pcap into wireshark and see what
> > > traffic is going between the NFS client and server.
> > > (Unlike tcpdump, wireshark does know how to decode NFS properly.)
> >
> > If/when the issue happens again, I will attempt to do this and report back.

I am also working on getting access to Wireshark.

In the interim, it did happen again, so the best I can do is put a
little bit of tcpdump output here: https://pastebin.com/UDrphwr5 .

I can't vouch for "correct" but it does mostly seem to decode the NFS packets.

It seems to loop the same couple of actions with long delays (15
seconds) between retries:

This sequence:
+0.0000s: Client -> server xid 1205841201 getattr fh 0,7/2 ("Getattr"
in packet body)
+1.4106s: Client -> server xid 1205841202 getattr fh 0,5/2 ("Renew" in
packet body)
+0.0002s: Server -> client xid 1205841202 getattr LNK 12231267145 ids
1/53 sz 0 ("Renew" in packet body)
+3.8001s: Server -> client xid 1205841201 getattr ERROR: Request
couldn't be completed in time ("Getattr" in packet body)

Repeats after 15 seconds:
+15.0090s: client -> server 1205841203 getattr fh 0,7/2 ("Getattr" in
packet body)
... etc

The "fh 0,7/2" and "fh 0,5/2" seem to be consistent each time. The xid
(transaction/request ID?) increments each time.

Maybe that will provide a lucky flash of insight in the interim.

Thanks!