Re: NFS 4.2 "RPC struct is bad" revisited (with much more detail)

From: J David <j.david.lists_at_gmail.com>
Date: Mon, 09 Dec 2024 23:34:06 UTC
On Sat, Dec 7, 2024 at 5:42 PM Rick Macklem <rick.macklem@gmail.com> wrote:
> Finally, why would you assume that putting a fix in the FreeBSD
> client is somehow easier and less logistically time consuming
> compared to fixing a Linux server.

Because if you or I could come up with a workaround or a way to not
cache the bad response so it would at least retry sooner, I could
apply it and rebuild from source. I can't do that on Linux. If there's
a way to do that with a patch from linux-nfs folks on a Debian system
at all, I have no idea what would be involved or how to even begin.

A fix on their end would, most likely, have to go through the complete
release process from linux-nfs, the Linux kernel group, and then the
Debian project.

> (For example, have you looked hard for any evidence that there
> is a hardware issue w.r.t. that server?)

There is no evidence that there is a hardware issue. Nor is it just
one specific server or one client.  There are many clients and many
servers, and this can happen to any combination. This is just the case
where I was easily and reliably able to reproduce it. It's so reliable
I may even be able to reproduce it in a couple of VMs, which is what I
am waiting to have time to do before I reach out to linux-nfs.

I put the pcap file in a safe place and am happy to send you a copy. I
will do so as soon as I figure out where I put the safe place...

Thanks!