[Bug 251347] NFS hangs on client side when mounted from outside in Jail Tree (BROKEN NFS SERVER OR MIDDLEWARE)

From: <bugzilla-noreply_at_freebsd.org>
Date: Wed, 08 Sep 2021 00:56:28 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=251347

--- Comment #13 from Rick Macklem <rmacklem@FreeBSD.org> ---
Ok, let me try to explain what the "...BROKEN MIDDLEWARE OR.."
message means. There are certain file attributes, such as fileno
(think i-node#) that should *never change*.
When the NFS client receives file attributes where fileno for a
given file has changed, it knows something is "badly broken".

One cause of this was a middleware box (hardware/software that
sits between the NFS client and NFS server in the network
infrastructure) that could fail.
- This "middleware box" cached NFS requests/replies. If it saw
  a request from the NFS client for attributes for the same file
  it replied to the Getattr with cached attributes.
  --> This reduced NFS server load, since the NFS server never saw
      the Getattr RPC request.
Such a technology existed and would sometimes reply with bogus
attributes for a different file. What was this device called?
I have no idea. The guy who told me about this gave no details
w.r.t. vendor/product/... (I assumed he was under NDA and could
not disclose details beyond this broken device generating the above
problem.

Since it seems that the FreeBSD server is not broken in this regard
(I would see a lot more bug reports about this if it was), then
what else might cause this to happen? (ie. fileno mysteriously changes)
Here's some unlikely, but possible theories:
- Flakey memory in the NFS server that sometimes flips a bit
  that happens to be used to store the "fileno" attribute.
- Flakey network interface transmit side that flips a bit before
  calculating the network checksum, so that the network checksum
  succeeds.
  --> It would seem that most garbled network packets would be
      caught by checksum failures, but checksums are not infallible.
You may be able to dream up more. Mostly within the network fabric
between the client<-->server.
Given how unlikely these latter possibilities are, you can see why
the known case of the "broken middleware box" gets mention in the
message.

-- 
You are receiving this mail because:
You are the assignee for the bug.