Re: NFS 4.2 "RPC struct is bad" revisited (with much more detail)
- In reply to: J David : "Re: NFS 4.2 "RPC struct is bad" revisited (with much more detail)"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 12 Dec 2024 21:43:51 UTC
On Mon, Dec 9, 2024 at 3:34 PM J David <j.david.lists@gmail.com> wrote: > > On Sat, Dec 7, 2024 at 5:42 PM Rick Macklem <rick.macklem@gmail.com> wrote: > > Finally, why would you assume that putting a fix in the FreeBSD > > client is somehow easier and less logistically time consuming > > compared to fixing a Linux server. > > Because if you or I could come up with a workaround or a way to not > cache the bad response so it would at least retry sooner, I could > apply it and rebuild from source. I can't do that on Linux. If there's > a way to do that with a patch from linux-nfs folks on a Debian system > at all, I have no idea what would be involved or how to even begin. > > A fix on their end would, most likely, have to go through the complete > release process from linux-nfs, the Linux kernel group, and then the > Debian project. > > > (For example, have you looked hard for any evidence that there > > is a hardware issue w.r.t. that server?) > > There is no evidence that there is a hardware issue. Nor is it just > one specific server or one client. There are many clients and many > servers, and this can happen to any combination. This is just the case > where I was easily and reliably able to reproduce it. It's so reliable > I may even be able to reproduce it in a couple of VMs, which is what I > am waiting to have time to do before I reach out to linux-nfs. > > I put the pcap file in a safe place and am happy to send you a copy. I > will do so as soon as I figure out where I put the safe place... Just to bring the list up to date... J. David did send me a packet trace. The problem is that the "length of the GETATTR bitmap" is a word of 0 instead of 2, although the 2 words of bits and the associated attributes is in the reply on-the-wire. This wouldn't be an obvious Linux knfsd bug. It might be some sort of runaway pointer or use after free bug. There is no way the FreeBSD client can easily know that the reply is corrupted in this way, so I think reporting "RPC struct is bad" is reasonable. I have sent a patch to J. David that modifies the NFSv4 Readdir RPC to not do a GETATTR after the READDIR. It might work for him, but I do not consider it appropriate for FreeBSD at this time. rick > > Thanks!