[Bug 261291] ESX NFS4.1 client hangs, server never responds to EXCHANGE_ID/CREATE_SESSION

From: <bugzilla-noreply_at_freebsd.org>
Date: Tue, 08 Feb 2022 22:43:15 UTC

--- Comment #13 from Rick Macklem <rmacklem@FreeBSD.org> ---
Ok, I looked and this time the packet trace makes it clear.
The ESXi client is broken.

If you look at the seqid after the clientid in packet #477
(the Exchange_ID reply), you will see it is 0.

If you look at the seqid in the CREATE_SESSION request in packet #478,
you will see it is 39.

They must be the same, as noted by the RFC...

      Each client ID serializes CREATE_SESSION via a per-client ID
      sequence number (see Section 18.36.4).  The corresponding result
      is csr_sequence, which MUST be equal to csa_sequence.

The RFC does not seem to make it crystal clear what the error reply
should be. The FreeBSD server uses NFS4ERR_STALE_CLIENTID.
If the VMware engineers argued that it should be a different error,
that could be changed in the FreeBSD server code. Since they send a
bogus seqid argument, I doubt it matters which error the FreeBSD
server sends in the reply.

This is basically the same bug that shows up when multiple trunks
(multiple TCP connections) are used by the client. In that case,
it usually succeeded after several hundred tries, when the seqid values
happen to be the same.

If you have a maintenance contract with VMware, you can try reporting
the bug to them. I have no way of doing so.

sense to me. If there is a problem with a TCP connection, other clients
(Linux and FreeBSD, plus others) create a new TCP connection.
They only do the above commands if the server replies NFS4ERR_EXPIRED.

You are receiving this mail because:
You are the assignee for the bug.